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Teaching Morse-Code Reception with Signals Weighted 
in Frequency According to Their Difficulty * 


Murray Sidman,’ Fred S. Keller, Edmund J. Kennedy, Maurice P. Wilson 


Columbia University 


A number of studies, cited in a recent re- 
view (7), have revealed characteristic errors 
in learning to receive International Morse 
code, and have shown that certain signals are 
more difficult to discriminate than others. 
This has led to the suggestion that the fre- 
quency of signal presentation, during train- 
ing, be manipulated in accordance with the 
relative difficulty of the signals (4, 6). Al- 
though an experimental test of this procedure 
was attempted by Morsh, Stannard, and Gra- 
ham (5), their results were admittedly in- 
conclusive. 

The present study was designed in such a 
way that, for some subjects (the experimental 
group), the greater the difficulty of the signal 
the greater the number of times it was pre- 
sented during training, whereas, for others 
(the control group), all signals were presented 
with equal frequency. The weighting pro- 
cedure was necessarily arbitrary, since there 
are no unassailable a priori grounds for any 
specific method. It was expected, however, 
that the range of frequencies employed would 
be sufficient to test the common-sense assump- 
tion that weightings positively correlated with 
difficulty during training should speed up the 
learning of Morse code. 


Method 


Subjects and apparatus. The subjects (Ss) were 
paid volunteers, all students at the Columbia Uni- 
versity 1952 summer session. None of them claimed 
any experience with Morse code beyond memorizing 


1 This research was made possible by support ex- 
tended Columbia University by the Air Force Hu- 
man Factors Operations Research Laboratories un- 
der Contract AF 18(600)-115. 

2Now of the Army Medical Service Graduate 
School, Walter Reed Army Medical Center. 


the alphabet, from the printed page, during past 
membership in the Boy Scouts. 

The signals used during each practice session were 
punched on Wheatstone tape and transmitted to the 
Ss by means of a Boehme Primary Training Keyer 
with a built-in speaker. No earphones were used. 
Speed of transmission was varied by changing the 
durations of between-signal and between-group 
spaces. The signal speed itself was held constant at 
20 groups per minute (GPM) as measured by the 
standard five-letter group, pride. 

Procedure. When the Ss were chosen, they were 
given their choice of one of two sections. Subse- 
quently, at the start of the experiment, one of these 
sections became the experimental group and the 
other the control. At the first meeting of each sec- 
tion, a code aptitude test? was administered. At 
the end of the experiment, the 19 remaining Ss of 
the experimental group were matched in test per- 
formance with an equal number of control Ss on 
the basis of group means and standard deviations. 
The matches were made by a colleague who was un- 
aware of the meaning of the data he was matching. 
The results presented below are those obtained from 
the 38 Ss so selected. 

Signals were presented to Ss in training and test 
runs of 100 signals each, with a short rest period be- 
tween runs. The code-voice procedure was used in 
training. This involved, basically, the presentation 
of a signal and, after a pause of about 3.5 sec., its 
voiced identification. In the interval between signal 
and voice, S tried to write on his practice sheet the 
appropriate letter or digit, and, in the 1- or 2-sec. 
pause before the next signal came, he corrected his 
sheet for errors of substitution or omission. In the 
rest period between runs, he counted his errors and 
entered the ‘otal on a “box-score” record of his daily 
performance. Instruction in the use of practice and 
box-score sheets was given prior to the first code- 
voice run, together with a single preliminary identifi- 


8 The aptitude test, tentatively named the “dot- 
counting test,” has yielded sufficiently high correla- 
tions with certain measures of progress to warrant 
further development. Its essential feature is the use 
of signals which have been found to produce “dot- 
ting errors” (2, 3), with subjects being asked only 
to record the number of dots in each signal. 
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Table 1 
Rank Order of Difficulty and Frequency Weightings for 36 Morse Code Characters 
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* Taken from Keller and Taubman (4). 


cation of all the signals using the code-voice pro- 
cedure. 

Frequency weightings were assigned according to 
the rank order of difficulty reported by Keller and 
Taubman (4). This order and the assigned weights 
are presented in Table 1. The signal for E, to be 
presented least often, was given a weighting of 1. 
The weighting for the other signals indicates the 
number of times they were presented relative to 
presentations of E.* Throughout all training runs, 
the experimental group was exposed to the signals 
with the frequency of presentation determined by 
the assigned weights, whereas the control group was 
presented with each signal an equal number of 
times. The practice material was generated in a 
way that provided for the presentation of all sig- 
nals at least once within any succession of 183 sig- 
nals for the experimental group. (Each signal was 
presented 5 times within any succession of 180 sig- 
nals for the control group.) 

At the end of each daily session of practice runs, 
each group was given one or more test runs, in 
which the signals were not weighted and were not 
identified for Ss. These runs, identical for the two 
groups, were scored by the experimenters, who re- 
ported error totals to Ss at the next session. 

As training progressed, the time intervals between 
signal and voice and the next signal in some of the 


4 The range of frequency weightings in this study 
closely approximates the 2-10 range recommended 
by Seashore and Kurtz (6), but their assignment of 
specific values to signals was based upon a different 
rank order of difficulty. 


code-voice runs were decreased, and the speed of 
some of the test runs was increased. These varia- 
tions were made in order to keep up the morale of 
the better students and to provide what were ex- 
pected to be more sensitive measures of progress. 
Table 2 gives the number of code-voice and test 
runs, and their speeds during each session. 
Experimental sessions were held on five successive 
days of each week, and each session was of 50- 
minute duration. No more than two absences were 
permitted any S during the entire experiment. 


Results and Discussion 


On the code aptitude test, the difference 
between the mean number of errors for the 
two groups was .5, yielding a ¢ value of 0.15. 

In Fig. 1, the median number of errors on 
the 4-GPM test runs is plotted for the two 
groups. Although fewer errors are shown for 
the experimental group at the beginning and 
the control group at the end, the differences 
are clearly too small to be of any practical 
importance in connection with code-school 
procedures. 

It is possible that the 4-GPM test runs 
were mastered so rapidly by the better stu- 
dents that tests at this speed did not pro- 
vide a sensitive measure. Figure 2 shows us, 
however, that test runs at 5 GPM do not dif- 


. 





Teaching Morse Code Reception 


Table 2 


Number and Speed of Code-Voice and Test Runs 
in Each Session 








Code-Voice 


Seconds 
Between Tests 
Signal —_—_—_—_———— 
Num- and Num- 
ber Voice ber Speed 
1 3.6 = 
3 3.6 
3 3.6 
3 2.7 
0 = 





4 GPM 
4GPM 
4 GPM 
5 GPM 
6 GPM 
4GPM 
5 GPM 
4 GPM 
5 GPM 
6 GPM 
7 GPM 
4 GPM 
5 GPM 
6 GPM 
4GPM 
5 GPM 
6 GPM 
7 GPM 


0 
0 
1 
1 
2 
2 
y 
1 
1 
2 
2 
2 
1 
1 
1 
1 
2 
2 
2 
1 





ferentiate the groups, while at 6 GPM a slight 
initial superiority of the experimental group 
is eliminated with practice. Only two runs 
were given at 7 GPM, and results are equivo- 
cal: the median error scores for the experi- 
mental group were 36 and 16, with which the 
corresponding control medians of 27.5 and 
17 may be compared. 

These results tell us that signal weighting 
in accordance with difficulty ranking does not 
facilitate progress as measured by equal-fre- 
quency test runs. There remains the question 
of the effect of weighting upon progress in the 
code-voice runs themselves as measured re- 
gardless of frequency. Figure 3 answers this 
question. It shows that the progress of the 
experimental group was retarded but that 
eventually the two groups reached the same 
proficiency level. 

The findings of this experiment are in line 
with those reported by Jerome and Keller (1) 
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Fic. 1. Median error scores for both groups on 
the 4 GPM test runs presented in consecutive order. 
The open circles represent the scores for the experi- 
mental group. Refer to Table 2 to relate the runs 
to the daily sessions. 


who found that special drill on frequently 
confused pairs of signals did not increase the 
proficiency of students in learning to receive 
high-speed code. Jerome and Keller were 
concerned mainly with the “dotting error” 
(2, 3) and their failure to eliminate signal 
confusions through drill may be related to in- 
dividual limits of discriminative capacity in 
their Ss. In the present study, the failure of 
signal weighting to speed up progress must be 
assigned to some other cause than mere stimu- 
lus generalization, but no satisfactory ex- 
planation suggests itself at this time. 
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10 15 
TEST RUNS 
Fic. 2. Median error scores for both groups on 
the 5(A), 6(B), and 7(C) GPM test runs presented — 
in consecutive order. The open circles represent the. 
scores for the experimental group. Refer to Table 2 
to relate the runs to the daily sessions. 
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Fic. 3. Median error scores for both groups on 
the code-voice runs presented in consecutive order. 
The open circles represent the scores for the experi- 
mental group. Refer to Table 2 to relate the runs 
to the daily sessions, and to determine the interval 


between signals. 
Summary 


Two groups of subjects were trained to re- 
ceive International Morse code. An experi- 
mental group was presented with signals that 
were weighted, with respect to their fre- 
quency, in rough proportion to their order of 
difficulty for beginners. A control group was 
presented with each signal an equal number 
of times. Both groups were given frequent 
tests on the same nonweighted sets of signals. 

No practically significant differences were 
observed in the progress of the two groups as 


measured by the test runs. The only con- 
sistent difference was a slight retardation in 
progress of the experimental group in their 
code-voice runs. 


Received February 15, 1954. 
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Dimensional Analysis of Motion: VIII. 


The Role of Visual 


Discrimination in Motion Cycles * 


J. Richard Simon and Robert C. Smader 


University of Wisconsin 


It is a widely recognized iact that visual 
discrimination plays several different roles in 
determining human response. The general 
process of discrimination in judgmental and 
esthetic response is well known. Recent 
studies in the field of human engineering 
specify discriminative behavior in terms of 
the properties of visual displays as they af- 
fect operator performance. Terms such as 
population stereotype (3), stimulus-response 
compatibility (4), and spatial stimulus-re- 
sponse correspondence (8) are used to desig- 
nate the correlation between the different 
dimensional properties of the control move- 
ments of the operator and the dimensional 
properties of the stimulus environment. An- 
other aspect of the problem of perception and 
motion is the question of how the coordina- 
tion of movement within a complex motion 
cycle is affected by a specific discrimination 
within it. 

The present study is concerned with the ef- 
fect of a specific visual discrimination on the 
interrelation of component movements in a 
complex motion pattern. For example, one 
may ask, what effect does the introduction of 
a visual discrimination into a motion pattern 
have on the duration of the entire motion as 
well as on its parts or component movements? 
Or, stated differently, is the effect of the visual 
discrimination specific to one phase of the mo- 
tion cycle or does the discrimination affect 
other or all phases of the cycle? 

Time and motion studies have attempted 
to handle the problem of discrimination and 
motion. In this field a discrimination repre- 
sents a separate, distinct operation. For ex- 
ample, the therblig (5) “select” represents a 
discriminative response consisting of the se- 
lection of one part from a number of similar 
parts. Is visual discriminative response a 

1 This research was supported by funds voted by 
the Legislature of the State of Wisconsin, and as- 


signed by the Graduate School Research Committee, 
the University of Wisconsin. 


. Figure 1 shows an S seated at the apparatus. 


separate unitary subdivision of the motion 
cycle, as this procedure implies, or does a 
visual discrimination affect other components 
of the motion cycle as well? It is precisely 
this question that the present study proposes 
to answer. 

In conducting this study we have attempted 
not only to collect data on the question noted 
above, but also to determine the relation be- 
tween the discriminative process in the mo- 
tion and other factors that are important in 
defining the duration of the movement. In 
particular, learning and travel distance of the 
motion are investigated. 


Method 


Apparatus. A preplanned performance situation 
designed for the study of assembly motions is used. 
The 
S’s task is to reach to a supply bin, grasp a washer 
from the bin, carry it to a work area, and place it 
on one of 12 upright pins. These same operations 
are repeated sequentially until each of the 12 washers 
has been transported from the bin to the work area 
and placed on an upright pin. The pins are *4 in. 
in diameter, arranged in a horizontal line parallel to 
the bin. The first washer is placed on the first pin 
at the left, the second washer on the second pin, 
and so on from left to right. The washers are 1 in. 
in diameter, approximately % in. thick, with an in- 
side diameter of 44 in. One flat side of each of the 
washers is marked with three red dots. On half of 
the trials, S is instructed to place the washers on 
the pins with the red dots facing up, and on the 
other half of the trials, he is instructed to place the 
washers on the pins with either side facing up. 

The 12 upright pins are held in a sliding plate 
which can be positioned at varying distances from 
the bin. The distance of the travel motions in the 
assembly operation is thereby. varied. Three travel 
distances between bin and assembly area are used: 
4 in., 8 in., and 12 in. 

Separate measurements of the duration of the 
manipulative and travel components of the assembly 
motion are automatically recorded in hundredths of 
a second by means of the electronic motion analyzer 
(11) pictured at the left in Fig. 1. 

The analyzer consists of a four-channel electronic 
relay circuit actuated by a subthreshold current. 
The subject operator acts as the key in the circuit. 
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Fic. 1. The S seated at the apparatus. The electronic motion analyzer is pictured at the left. The S is 
holding a metal rod electrode in his left hand which enables him to complete the subthreshold circuit. 


When S makes contact with the assembly plate 
while placing a washer on a pin, a relay is tripped 
and current is supplied to the electric clutch of the 
assembly-manipulation clock. This first clock meas- 
ures the total time per trial during which S is in 
contact with the assembly plate. As soon as the 
operator breaks contact with the plate and moves 
toward the bin to pick up a washer, the assembly- 
manipulation clocks stops and the unloaded-travel 
clock is energized. This second clock records the 
total time per trial that is spent in moving from the 
assembly area to the bin. When a contact is made 
with the bin, the unloaded-travel clock stops and 
the grasp-manipulation clock is activated. This 
third clock records the total time per trial that S 
is in contact with the bin. The grasp-manipulation 
clock stops and the fourth or loaded-travel clock 
starts when the operator, with the washer in his 
grasp, begins to transport it back to the assembly 
area. The total time per trial for the loaded-travel 
movement is recorded on this fourth clock. This 
clock stops and the assembly-manipulation clock is 
energized again to begin timing another motion cycle 
when contact is made with the assembly plate. The 
recording apparatus is set in motion when, after a 
“ready” signal by E, S briefly touches the assembly 
plate on the way to pick up the first washer. 
Experimental design and procedure. Two inde- 
pendent variables, condition of discrimination and 


travel distance, are manipulated in this study. Of 
primary interest is the discrimination variable. In 
the discrimination condition washers are placed on 
the pins with the red dots up. In the nondiscrimi- 
nation condition washers are placed on the pins with 
either face up. The second variable is distance of 
travel between the washer supply bin and the as- 
sembly plate. Three distances of travel are used. 
There are thus six combinations involving two con- 
ditions of discrimination and the three distances of 
travel. 

The experimental design takes the form of a 2 X 3 
factorial in the cells of a replicated 6 X 6 latin 
square. This design affords simultaneous control of 
individual differences, sequence of treatments, and 
ordinal position of treatments in sequences. Twenty- 
four college students, 12 males and 12 females, are 
assigned to the rows of the random 6X 6 latin 
square. Two males and two females are randomly 
assigned to each of the six sequences of experimental 
conditions. Thus, each S performs under all six ex- 
perimental conditions. Each S performs in only his 
assigned sequence of conditions for the duration of 
the experiment; he is given three repetitions of his 
assigned sequence on each of the first four days. 
On the fifth and sixth days five such repetitions are 
given. The purpose of the additional repetitions on 
the fifth and sixth days is to obtain more reliable 
measures of response times. 
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Table 1 


Summary of Analysis of Variance of Assembly- 
Manipulation Component—Day 6 








Mean 
Square 


18.07666 


38267 
02032 


Sum of 
Squares 


18.07666 


76534 
.04063 


Source of 
Variation 


df 





Discrimination 1 
Travel distance 2 
Interaction 2 
Sequences 
(rows) 5 12.387389 
Trials 
(columns) 
Square unique- 
ness 
Indiv. Ss 
within 
sequences 
Error 


2.477478 


151955 030391 


20 4.408936 .220447 


18 
90 


29.96877 
14.53602 


80.3357 


1.664932 
161511 


Total 143 





** Significant at 1% confidence level. 


Table 2 


Summary of Analysis of Variance of Grasp- 
Manipulation Component—Day 6 








Mean 

Square F 
.702802 14.83** 
143575 = 3.03 
.022103 2.14 


Sum of 

Squares 
.702802 
.287150 
044206 


Source of 
Variation df 





Discrimination 
Travel distance 
Interaction 
Sequences 
(rows) 
Trials 
(columns) 
Square unique- 
ness 
Indiv. Ss 
within 
sequences 
Error 


15.838891 3.167778 1.98 


244566 .048913 


20 1.0112 05056 


18 
90 


28.788717 
4.203268 


1.599373 
046702 


Total 143 51.1208 





** Significant at 1% confidence Iovel. 


Durations are recorded for the four component 
movements of the total task. A median score is ob- 
tained for each of the component movements under 
the six conditions on each of the six days. This 
median measure is used to eliminate effects of ex- 
treme scores caused by dropping washers and other 
uncontrollable factors. 

In addition to the test trials, each S performs one 
calibration trial each day. Because of differences in 
reaction time of the relays in the circuit, and because 
of factors intrinsic to each clock mechanism, clock 


readings for the same elapsed time differ slightly. 
In the calibration trial, discrepancies in the clock 
readings are noted and the appropriate additive cor- 
rection is made so that all clocks are comparable. 
Corrected median times are the basis for all fur- 


ther statistical treatment of the data. 


Results 


Results of this study are divided into two 
general sections for presentation: (a) the ef- 
fects of the discrimination and distance vari- 


Table 3 


Summary of Analysis of Variance of Unloaded- 
Travel Component—Day 6 








Source of 
Variation 
Discrimination 
Travel distance 
Interaction 
Sequences 
(rows) 
Trials 
(columns) 
Square unique- 
ness 
Indiv. Ss 
within 
sequences 
Error 


Total 


Sum of 

df Squares 
1 .923190 
2 18.515754 
2.140145 


4.810089 
.168244 
.716247 


18 
90 


143 


50.604561 
2.626870 


78.50510 





** Significant at 1% confidence level. 


Table 4 


Mean 
Square F 


923190 30.37** 
9.257877 304.62** 
.070072 2.30 





.962018 2.92 


.033649 1.11 


.035812 1.23 


2.811364  92.50** 


.029187 


Summary of Analysis of Variance of Loaded- 
Travel Component—Day 6 





Source of 
Variation 


Sum of 
Squares 


df 





Mean 
Square F 





Discrimination 
Travel distance 
Interaction 
Sequences 
(rows) 
Trials 
(columns) 
Square unique- 
ness 
Indiv. Ss 
within 
sequences 
Error 


Total 


1 8.40999 

2 23.93687 
2 .061329 
12.741931 
424414 
1.511796 


18 63.507149 
90 4.903221 


143 115.4967 





** Significant at 


8.40909 144.21** 
1.968435 205.23** 
030664 1.90 
2.548386 1.38 

084883 


.075590 


3.528175 


1% confidence level. 
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ables on the componer! movements in the as- 
sembly task, anc (i, ihe effects of practice 
on the component movements under the dif- 
ferent experimental conditions. 

Effects of variables on component move- 
ments. Data from day 6 are subjected to 
four separate analyses of variance,? one for 
each of the component movements. These 
analyses, summarized in Tables 1, 2, 3, and 
4, bring out the fact that the discrimination 
variable produces significant differences in the 
duration of each of the four components of 
the movement pattern, viz., assembly manipu- 
lation, grasp manipulation, loaded travel, and 
unloaded travel. The travel distance variable 
produces a significant difference in the loaded- 
and unloaded-travel components but does not 
significantly affect either assembly manipula- 
tion or grasp manipulation. The interaction 
or correlation between the discrimination and 
distance variables is not significant for any 
of the movement components. Individual Ss 
within sequences, or between-Ss differences, 
are significant in all four analyses. Sequences 
and trials do not produce significant differ- 
ences. 

The results reported above are pictured 
graphically in Fig. 2 and 3. Figure 2 shows 
the differential effects of discrimination on 
the mean times for the four components of 
the assembly task. Addition of the discrimi- 
nation condition, i.e., the assembly of the 
washers with the red dots facing up, increases 
the durations of the assembly manipulation 
by 15.1% and of the grasp manipulation by 
7.7%. The durations of the unloaded-travel 
movement and the loaded-travel movement 
are increased 4.5% and 12.3% respectively. 
All of these differences are significant at the 
1% level of confidence. 

Figure 3 pictures the effect of the travel- 
distance variable on the four components of 
the motion for both discrimination and non- 
discrimination conditions. It is observed that 
an increase in the travel distance between 
bin and assembly area produces a significant 
increase in the durations of the two travel 

2 Preceding the analyses of variance, the data were 
subjected to the Bartlett chi-square test for homo- 
geneity of variance. The data all satisfied the as- 


sumption of random sampling from populations with 
a common variance. 
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Fic. 2. Effect of discrimination variable on dura- 

tion of component movements. Data for all three 

distances are combined. 
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components of the motion while the dura- 
tions of the two manipulation components are 
not significantly altered. The travel distance 
variable affects the component movements in 
the same manner under both the discrimina- 
tion and nondiscrimination conditions. 

Coefficients of reliability for the four com- 
ponent movement scores are obtained by cor- 
relating the day 5 and day 6 scores for the 
separate component movements. The coeffi- 
cients of reliability for all components except 
assembly manipulation are of the order of 
+ .85 to + .95. The assembly-manipulation 
scores are singularly unreliable. In all cases 
the travel components have a greater day-to- 
day reliability than the manipulative com- 
ponents of the movement. 

Effects of practice on the component move- 
ments. Figure 4 represents learning curves 
over a six-day period for the separate com- 
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. 3. Effect of travel-distance variable on dura- 
tion of component movements. 
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Fic. 4. Learning curves for discrimination and 
nondiscrimination conditions (travel distances com- 
bined). 


ponent movements. The discrimination and 
nondiscrimination conditions are plotted sepa- 
rately with all travel distances combined. 
The values for each day are based on the 
mean of the median scores of the 24 Ss. The 
assembly-manipulation time is uniformly the 
highest over the six days. The grasp-manipu- 
lation time is uniformly the lowest of the four 
measures of movement duration. The times 
for the loaded- and unloaded-travel move- 
ments are intermediate. The loaded-travel 
movements always require more time than 
the unloaded-travel movements. It can be 
seen that the over-all effect of the discrimina- 
tion condition is a general elevation of the 
learning curves for the separate components. 
The differences between discrimination . and 
nondiscrimination which are prevalent at the 
outset are not changed as a result of practice. 
This specific discrimination as a variable, 
then, seems to have no effect on the rate of 
learning. 

The significance of the observed learning 
effects has been examined; ¢ tests of the dif- 
ferences between day 1 and day 6 indicate 
that all components of movement display sig- 
nificant learning effects. 

When we consider all experimental con- 
ditions together, the assembly-manipulation 
component averages a 14.3% decrease from 
day 1 to day 6. The grasp-manipulation 
time decreases 22.6% from day 1 to day 6. 
The unloaded-travel time decreases 12.8%, 
and the loaded-travel time decreases 8.2%. 
Thus learning effects do not appear as a uni- 
form phenomenon in the component move- 


ments of the assembly task. The manipula- 
tion components are relatively more affected 
by practice than the travel components. 


Discussion and Summary 


Electronic methods of motion analysis have 
been applied to record separately and auto- 
matically the durations of four component 
movements in an assembly task. These com- 
ponents are assembly manipulation, grasp 
manipulation, loaded travel, and unloaded 
travel. This study has been concerned pri- 
marily with the determination of the changes 
that occur in the duration of the component 
movements of a motion cycle when a specific 
visual discrimination is imposed on part of 
this cycle. 

Time and motion study engineers, using 
Gilbreth’s therblig system, apply the therblig 
“select” when an operator must make a dis- 
crimination or choice. It is implicit in this 
formulation that the “select”? operation can 
be identified and separately timed. Any in- 
crease in the duration of the total task 
brought about by the discrimination or choice 
would be identical with the duration of this 
“select” operation. Presumably, the separate 
durations of the other component operations 
would be unaltered by the addition to the 
task of the choice operation. 

Results of the present study, however, indi- 
cate that the durations of all four components 
of a cyclic sequential movement pattern are 
increased significantly by the addition of a 
specific visual discrimination. In other words, 
it appears that the generalized effect of the 
discrimination condition on the entire motion 
pattern cannot be accounted for accurately by 
the addition of a single new perceptua: cher- 
blig. Thus, although the therblig analysis of 
motion has gained widespread acceptance in 
practical industrial situations, its adequacy in 
providing a systematic framework for guiding 
future research efforts on the role of percep- 
tion in motion is seriously questioned. Prac- 
tically, the procedure of adding together 
standard therblig times to establish by extra- 
polation a time for a new assignment or task 
not itself subjected to motion and time study 
may be inappropriate if the new task involves 
a “perceptual therblig.”” Results of the pres- 
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ent study suggest that the role of discrimina- 
tion in motor coordination is a complex one, 
serving to change all the temporal relations 
within the pattern of motion. 

It should be emphasized that the discrimi- 
nation condition in the present experiment 
sometimes involved an additional preposition- 
ing response on the part of S. In order to 
assemble the washer with a particular side 
facing up, the operator had to turn over some 
washers during the loaded travel, thus in- 
creasing the duration of this component move- 
ment. The reason for the increased duration 
in the other components is certainly less ob- 
vious since the pattern of movement appears 
to be unchanged from that required in the 
nondiscrimination condition. 

Future research in this area should attempt 
to appraise the effects of the locus and the 
complexity of the discriminative response on 
motor coordination. One might ask, how do 
discriminations made at the parts-supply area 
and at the assembly area differ in their effects 
on the component movements in an assembly 
task? How are the component movements in 


a skilled motion pattern altered as a function 
of the difficulty or complexity of the dis- 


crimination? : These questions may be an- 
swered by systematically manipulating the 
locus of the discrimination throughout the 
motion cycle and by varying that discrimina- 
tion along several levels of complexity. 
Other general results. of this experiment 
may be noted. Variations in the distance of 
travel between bin and assembly plate do not 
significantly affect either of the manipulative 
components of the task. These observations 
lend support to the results reported by Barnes 
(1). In addition, the course of learning in 
each of the component movements in this 
study appears to be unaltered as a function 
of the specific discrimination. The differen- 
tial effects of practice upon different com- 
ponents in a pattern of motion as found in 
this study have been observed in previous in- 
vestigations of assembly motions (11), fac- 
tory operations (2), panel control movements 
(9, 10, 12), and visual target tracking (6, 
7). These differential effects, which have 


been labeled “learning discrepancy in com- 
ponent movements” (6), consist of differences 
in rate of learning, in the course of learning, 
and in the over-all change resulting from 
learning for travel and manipulative move- 
ments. None of the generalized learning 
theories accounts for the specific aspects of 
performance noted in these studies of co- 
ordinated motion. 


Received January 22, 1954. 
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Note on the Effect of Viewing Angle on Accuracy of 
Reading Quantitative Scales 


K. F. H. Murrell? 


Tube Investments Ltd., Birmingham, England 


Cohen, Vanderplas, and White (1) have re- 
ported experiments on the effect of viewing 
angle on accuracy of reading dials. This ef- 
fect can, however, be computed from results 
of experiments carried out in the British Ad- 
miralty. These results have been published 
(3) although the experimental details have 
appeared only in an Admiralty Report (2). 

A finding of the Admiralty experiments 
was that reading accuracy continued to im- 
prove until the subjects (Ss) had made more 
than 2000 readings. In practice this is not a 
very large number for a user to make, and it 
would seem that any findings based on a 
smaller practice may be of doubtful practical 
use. Cohen et al.’s Ss apparently made only 
25 readings on each type of dial and they 
could not do better than about 87% accu- 
racy, though the unsatisfactory graduation 
systems used probably contributed to this re- 
sult. They were thus very unpracticed. 

Another finding was that practiced Ss will 
read well-designed dials consistently with 
98% accuracy provided that they do not have 
to interpolate a scale interval into more than 
five parts and provided that the scale marks 
are not nearer together than a critical sepa- 
ration depending on the viewing distance and 
the required interpolation. As the marks get 
closer together below this critical separation, 
reading accuracy falls logarithmically; it will 
therefore vary on a log sine curve with view- 
ing angles which make the mark separation 
appear less than the critical separation. 

For the reading distance of 28 in. and the 
10 X 5 interpolation used by Cohen e¢ al. the 
critical separation is .048 in., so with their 
separation of .124 in. the performance curve 
should have been a straight line at 2% error 
until a viewing angle of 22° was reached. 


1 Now at the Department of Psychology, Univer- 
sity of Bristol. 
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And strictly speaking, this curve should ap- 
ply only to parts of the scale lying near 6 
o'clock and 12 o'clock. At 3 o’clock and 9 
o’clock scale mark length is probably an im- 
portant factor. 

From these Admiralty data (which are 
somewhat less reliable for 10 x 5 than for 10 
x 2) the theoretical performance curve has 
been calculated for a scale having marks .060 
in. apart (Fig. 1). It is assumed that a good 
graduation system is used. It would be in- 
teresting to see whether this theoretical curve 
is confirmed by experiment. 


Received April 2, 1954. 


References 


1. Cohen, J., Vanderplas, J. M., & White, W. J. 
Effect of viewing angle and parallax upon 
accuracy of reading quantitative scales. J. 
appl. Psychol., 1953, 37, 482-488. 

2. Laurie, W. D., McCarthy, C., & Murrell, K. F. H. 
A study of the relationship between dial size, 
reading distance and accuracy of reading. 
Brit. Admiralty, Naval Motion Study Unit, 
Rep. No. 47, Nov. 1951. 

3. Murrell, K. F. H. The design of instrument 
scales. Instrum. Pract., 1952, 6, 225-232. 





The Journal of 


Applied Psycholo 
Vol. 39, No. 1 Hat ” ad 


55 


A Factor Analysis of Physical Proficiency and 
Manipulative Skill 


Walter E. Hempel, Jr. and Edwin A. Fleishman 


Air Force Personnel and Training Research Center 1 


This paper represents one in a series of fac- 
torial studies (see, e.g., 1, 2, 3, 4, 5, 6) con- 
cerned with the organization of abilities in 
certain of the relatively unexplored aptitude 
areas of motor skill. In the present study 
primary attention was given to the areas of 
gross physical proficiency and fine manipula- 
tive performance. Specifically, the analysis 
was undertaken (a) to investigate the inter- 
dependence of abilities contributing to indi- 
vidual differences in these two areas; and 
(6) to identify possible ability categories 
which might be useful and meaningful in de- 
scribing performance in these areas. 


Procedure 


The analysis is based on the intercorrelations 
among 46 tests of a- battery originally developed in 
this laboratory by J. B. Reynolds, J. A. Adams, 
and Ina McD. Bilodeau. The tests fell into three 
general categories: (a) seventeen manipulative appa- 
ratus tests; (b) six printed tests; and (c) twenty- 
three gross physical performance tests. The physi- 
cal performance tests are similar to those given in 
school physical education departments or military 
training situations to evaluate physical proficiency. 
However, the tests in this battery were highly stand- 
ardized. All the tests were pretested on samples of 
airmen and scored on a pass-fail basis. The pass- 
fail cutoff point, determined from the pretest results, 
was located so that approximately 50% of the ex- 
aminees would pass each test. The complete bat- 
tery had been administered to a sample of 400 basic 
trainee airmen. The list of tests and very brief de- 
scriptions of the operations required by each test 
follow. 


Manipulative Tests 


1. Cox pin board. Wrap a cord in a prescribed 
manner around pins closely spaced in rows on a 
board. 

2. Cox eye board. Thread a cord through a small 
hole in each of a series of pegs arranged in rows on 
a board. 


1This research was carried out at Lackland Air 
Force Base, San Antonio, Texas, in support of Proj- 
ect 7703. Permission is granted for reproduction, 
translation, publication, use, and disposal in whole 
or in part by or for the United States Government. 
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3. Santa Ana peg turning. Lift, rotate, and replace 
a series of pegs in a pegboard. 

4. Nut and bolt. Complete a series of nut and bolt 
assemblies through holes in a panel. 

5. Track steadiness. Trace through an irregular slot 
pathway with a stylus, without touching the sides 
of the path. 

6. Hex nut steadiness. Stack a series of small hexa- 
gonal nuts one on top of the other. 

7. Pin punch, right. Punch a tack through each of 
a series of tiny holes arranged in a pattern on a 
template, using the right hand. 

8. Pin punch, left. Same as 7, except using left 
hand. 

9. Speed of manipulation “A”’ Remove a number 
of small washers from a series of pegs. 

10. Speed of manipulation “B.” Replace the small 
washers on the pegs. 

11. Bali and pipe. Drop a small ball through a 
pipe, catch it as it emerges from the bottom of the 
pipe, and repeat the process. 

12. Speed of reaction. Catch a strip of metal with 
the thumb and forefinger as quickly as possible when 
it is released. 

13. Rotary aiming. Strike a series of buttons ar- 
ranged in a circle, as rapidly as possible, with the 
end of the index finger. 

14, Marble board. Place marbles, one at a time, in 
grooves in a board. 

15. Dowel manipulation. Arrange a series of sus- 
pended pieces of dowel so as to make a continuous 
strip of dowel. 

16. Restricted manipulation. Insert bolts through a 
series of holes in a panel by reaching behind it and 
avoiding the baffles encountered. 

17. VDL rings. Remove small rings from a small 
pole. 


Printed Tests 


18. Circle dotting. Place three dots in each of a se- 
ries of small circles. 

19. Irregular dotting pursuit. Place one dot in each 
of a series of small circles arranged in an irregular 
pattern. 

20. Speed of square marking. Place an “X” pre- 
cisely in each of a series of small squares. 

21. Two-hand coordination (printed). Draw a line 
through a pair of vertical lines with one hand, while 
drawing a line through a pair of horizontal lines 
with the other hand. 

22. Pattern discrimination. From the shape and 
shade of combinations of geometric figures, respond 
according to prearranged directions. 
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23. Discrimination square marking. Mark the cor- 
rect number of squares for cach of a series of prob- 
lems, according to a two-digit code. 


Physical Performance Tests 


24. Chinning. While hanging by the arms from a 
bar, pull the body up until the chin is above the 
bar, and repeat. 

25. Jump and click heels. Jump and click heels as 
often as possible before returning to the floor. 

26. Jump and turn. Leap in the air and turn 
around as far as possible before returning to the 
floor. 

27. Push-ups. Push up body from prone position, 
until arms are fully extended, and repeat. 

28. Cable jump. Jump through a cable held in the 
hands and land on both: feet in front of the cable. 
29. Two-foot rail balance. Maintain balance with 
both feet placed heel to toe on a 1-in. board. 

30. Foot balance I. Maintain balance on the left 
foot, with the right foot resting on the inside of the 
left knee. 

31. Foot balance 11. Maintain balance on the left 
foot, with the right foot behind the left knee. 

32. Rail walking. Walk on a 1-in. rail, heel to toe, 
with hands on back of hips. 

33. Circular tie walking. Move around a circle in 
a specified manner with feet spread apart on the 
edge of a circular rail. 

34. Rising from supine. From supine position, rise 
to standing position without using hands or arms. 
35. Leg bend. Squat slowly on the right leg only, 
until the thigh touches the calf of the leg. 

36. Leg raising. Raise legs to the level of the head 
while in an upright sitting position. 

37. Backing down wall. Bend backwards as far as 
possible. 

38. Table vault. Vault over a table. 

39. Kicking height. Kick one foot as high as the 
head. 

40. Backward jump. Jump backwards as far as pos- 
sible. 

41. Abdominal pivot. Push body around with hands 
while lying on stomach and keeping back arched. 
42. Hurdle jump. Accomplish an ordinary standing 
high jump with certain specifications. 

43. Jump and balance. Jump onto the edge of a 
box and maintain balance. 

44. Rate of jump. Jump forward up several steps 
and then backward down the steps, keeping feet to- 
gether. 

45. Toe touching. Bend forward and downward as 
far as possible without bending knees. 

46. Jump and touch. Jump up and make a mark 
on the wall as high as possible. 


Results 


The tetrachoric correlations among the test 
variables (Table 1) * were subjected to a 


2 Tables 1, 2, and 3 have been deposited with the 
American Documentation Institute. Order Docu- 


Thurstone centroid factor analysis. Factor 
extraction was continued well beyond the 
point where any meaningful factor variance 
was suspected to be present. The 17 centroid 
factors extracted are presented in Table 2. 
Orthogonal rotations of the primary axes 
were accomplished by Zimmerman’s graphical 
method (9) until simple structure and posi- 
tive manifold were closely approximated. 
Table 3 presents the final solution of rotated 
factor loadings. 


Interpretation of Factors 


Rotated loadings of .30 or greater were con- 
sidered significant in defining the factors. 

Factor I appears to be the same as that 
factor called Aiming in some previous studies 
(e.g., 2, 4, 5, 7). 


No. Variable 


19 Irregular dotting pursuit (printed) 82 
20 Speed of square marking (printed) 71 
18 Circle dotting (printed) .69 
21 Two-hand coordination (printed) 58 
13 Rotary aiming 38 
22 Pattern discrimination (printed) 36 
3 Santa Ana peg turning 33 
9 Speed of manipulation “A”’ 32 
1 Cox pin board 31 


Loading 


It is defined as the ability to perform 
quickly and precisely a series of directed 
movements requiring eye-hand coordination. 
The largest loadings on this factor are for 
printed tests requiring rather exact visual 
alignment of the response. Loadings are 
highest as the area to be marked in becomes 
smaller. 

Factor II is defined as Limb Strength. 


No. Variable Loading 
27 Push-ups 57 
24 Chinning 55 
46 Jump and touch 33 


Primary emphasis in both the tests with 
highest loadings on this factor is on strength 
of the arms. The presence of “Jump and 





ment No. 4290 from the ADI Auxiliary Publications 
Project, Photoduplication Service, Library of Con- 
gress, Washington 25, D. C., remitting in advance 
$1.25 for microfilm or $1.25 for photocopies. Make 
checks payable to Chief, Photoduplication Service, 
Library of Congress. 
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touch” suggests that this factor may extend 
to leg strength as well. 

Factor Ill is defined as Gross Body Co- 
ordination. 


No. Variable Loading 
34 Rising from supine 57 
28 Cable jump 56 
42 Hurdle jump 52 
24 Chinning 52 
30 Foot balance I 36 
32 Rail walking 31 


It appears to involve the ability to co- 
ordinate movements where more of the entire 
body is involved. The tests most highly 
loaded on this factor require that the trunk 
as well as the limbs be employed simultane- 
ously in accomplishing the task. 

Factor IV is defined as Equilibrium Bal- 
ance. 


No. Variable Loading 
31 Foot balance IT 56 
29 Two-foot rail balance AT 
30 Foot balance I 40 


It represents the ability to maintain balance 
while in an abnormal stance or position. This 
feature is the crucial characteristic of all the 
tests loaded on this factor. 

Factor V is identified as Energy Mobiliza- 
tion. 


No. Variable Loading 
46 Jump and touch 65 
40 Backward jump 46 
42 Hurdle jump 39 
43 Jump and balance 35 
44 Rate of jump 35 
38 Table vault 34 


All the tests most heavily loaded on this 
factor are tests in which the objective is to 
jump as far or as fast as possible, where no 
accuracy is required. It appears that this 
factor involves the ability to mobilize quickly 
and effectively a maximum of energy or force. 
In the present analysis it is involved pri- 
marily in tasks of jumping. 

Factor VI is defined as Trunk Strength. 


No. Variable Loading 
41 Abdominal pivot .64 
27 Push-ups 43 
36 Leg raising 43 


It is defined by three tests in which the 
starting position is either prone or supine, 
and in which performance is dependent on 
the strength of the trunk muscles. This fac- 
tor, therefore, appears to involve the strength 
potential of the trunk muscles. 

Factor VII is identified as a Doublet with 
doubtful psychological significance. 


No. Variable 
7 Pin punch, right 71 
8 Pin punch, left .69 

13 Rotary aiming 32 


Loading 


The significant loadings on this factor are 
primarily those of the “Pin punch” subtests, 
which involve either the right or left hand on 
the same task. Variance in these tests is 
therefore obscure. 


Factor VIII appears to be a Reasoning 
factor. 


No. Variable 
23 Discrimination square marking (printed) .68 
22 Pattern discrimination (printed) 46 
16 Restricted manipulation 46 


Loading 


The tests defining it each require the inter- 
pretation of relatively complex relationships 
in arriving at a successful solution to the 
problem. This is true of both the printed 
tests as well as the apparatus test. It ap- 
pears that this factor represents a nonverbal 
reasoning factor. 

Factor IX is identified as Leg Suppleness 
or Flexibility. 


No. Variable 
45 Toe touching ‘ 
39 Kicking height 62 
35 Leg bend 

34 Rising from supine 

37 Backing down wall 


Loading 


It is defined primarily by three tests that 
require the leg muscles to endure considerable 
strain or distortion. It seems, therefore, that 
this factor involves the capacity of the leg 
muscles to resist deformity and to recover 
quickly from undue strain. 

Factor X is defined as Arm-Hand Steadi- 
ness. 
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Variable Loading 
Track steadiness 56 
Cox eye board A7 
Hex nut steadiness A2 
Santa Ana peg turning 32 


Tests defining this factor best are manipu- 
lative apparatus tests which require the abil- 
ity to make precise, steady arm-hand move- 
ments of the kind that minimize speed and 
strength. The “Track steadiness” test has 
defined this factor in a previous study (2). 

Factor XI is tentatively identified as Trunk 
Flexibility. 

No. Variable Loading 


37 Backing down wall 50 
38 Table vault 37 


It is rather poorly defined with only two 
tests having significant loadings on the factor. 
Both these tests, however, require the back 
and abdominal muscles to endure strain 
through either bending the body backwards 
or twisting it around. It therefore seems 
probable that this factor involves the ability 
of the trunk muscles to endure strain and 
distortion. 

Factor XII is identified as Manual Dex- 
terity. 


No. Variable Loading 
14 Marble board 51 
17 VDL rings 44 
15 Dowel manipulation 40 


It appears to involve the ability to make 
skillful, well-coordinated arm-hand manipula- 
tions, and is best measured by certain of the 
manipulative apparatus tests. 

Factor XIII is tentatively defined as Per- 
formance or Dynamic Balance. 


No. Variable Loading 
25 Jump and click heels 54 
43 Jump and balance .53 
+t Rate of.jump 46 


In each test defining this factor the crucial 
feature appears to be maintaining balance 
while in the process of some other perform- 
ance, such as jumping. This is distinguished 
from Factor IV (Equilibrium Balance) which 
involves a more static kind of balance. 
Whether or not the present balance factor is 


restricted to jumping activities remains to be 
tested by future studies. 

Factor XIV is identified as Finger Dex- 
terity. 


No. Variable Loading 
9 Speed of manipulation “A” AS 
11 Ball and pipe 45 
4 Nut and bolt 39 
16 Restricted manipulation 35 


It appears in tasks which emphasize skill- 
ful manipulations primarily with the fingers. 
It is distinguished from Manual Dexterity 
(Factor XII) which does not emphasize such 
finger movements. The separation of these 
two factors confirms previous findings (e.g., 
2, 4, 8). 

Factor XV is very tentatively identified as 
Jump Performance. 


No. Variable 
26 Jump and turn D> J 
25 Jump and click heels 50 
42 Hurdle jump 33 
38 Table vault 32 
46 Jump and touch 32 


Loading 


The identifying characteristic of the tests 
with highest loadings on this factor appears 
to be jumping and completing some perform- 
ance after jumping and before landing. It 
appears, therefore, that this factor involves 
the ability to perform a task while in the 
process of jumping. However, a more psy- 
chologically meaningful interpretation is lack- 
ing at present. 

Factor XVI and Factor XVII are Residual 
factors with no apparent psychological mean- 
ing. 

Summary and Conclusions 


Fifteen factors were identified to account 
for performance on the 46 experimental tests. 
While the precise nature of several of the fac- 
tors is yet uncertain, some general conclusions 
appear possible. 

1. The results indicate that the abilities 
contributing to performance on gross physical 
tasks are quite independent of those con- 
tributing to fine manipulative skill. No fac- 
tors were found that overlapped these areas. 

2. Pending the identification of additional 
factors or clarification of the present ones, 





16 Walter E. Hempel, Jr. and Edwin A. Fleishman 


the factors identified suggest a possible classi- 
fication of ability areas primarily involved on 
a wide range of motor tasks, especially those 
of a gross physical nature. They at least 
point up a way of organizing more meaning- 
fully the hodgepodge of such tests generally 
used in evaluating physical proficiency. 

The nine factors identified in the physical 
performance tests appear to fit under five 
general categories: 

a. Strength—of the limbs (Factor II) and 
of the trunk (Factor VI). 

b. Flexibility (ability of the muscles to en- 
dure and recover from strain and distortion) 
—of the legs (Factor IX) and of the trunk 
(Factor XI). 

c. Balance—static or equilibrium balance 
(Factor IV) and dynamic or performance 
balance (Factor XIII). 

d. Gross Body Coordination (Factor III) 
—ability to coordinate muscular movements 
where the trunk as well as the limbs are em- 
ployed simultaneously. ; 

e. Energy Mobilization (Factor V)—abil- 
ity to mobilize quickly and effectively a maxi- 
mum of energy or force. 

While better tests of these factors could be 
developed, these categories at least provide a 
functional classification of areas that might 
receive attention in assessing physical pro- 
ficiency and deficiency, or in sectioning classes 
for further training and development. 

3. The’ four factors identified in the ma- 
nipulative tests have each been identified in 
previous studies. These were defined as: 

a. Manual Dexterity (Factor XII)—abil- 
ity to make skillful, coordinated arm-hand 
movements. 

b. Finger Dexterity (Factor XIV)—abil- 
ity to make skillful manipulations with the 
fingers. 

c. Arm-Hand Steadiness (Factor X)—abil- 
ity to make precise, steady, arm-hand move- 


ments of the kind that minimize speed and 
strength. 

d. Aiming (Factor I1)—ability to make ac- 
curately directed positioning movements re- 
quiring precise, visual alignment and motor 
control (eye-hand coordination). 

4. In addition, an independent nonverbal 
reasoning factor (Factor VIII) was identified 
in certain of the printed tests. 

5. Future research should be directed at 
validating the factors obtained against more 
complex motor performance. Such research 


might possibly lead to a better indication of 
the unique contribution of tests in these areas 
to problems of classification and selection as 
well as to problems of evaluation and training. 


Received January 4, 1954. 
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Stop Signs or by Red Blinker Lights 
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At two intersections of one-way streets with 
a two-way street in Brooklyn, New York, 
there were several serious traffic accidents in 
a short period of time. Members of the 
neighborhood protested the danger, and the 
police had a stop sign erected on the one- 
way streets. State law requires that auto- 
mobile drivers bring their cars to a full stop 
at or before any intersection marked with 
_ such a sign. But to what extent do drivers 
obey this law? 

Three to four weeks after the signs had 
been erected, one of us (CH) recorded the 
actions of 170 automobiles at these intersec- 
tions and found that less than one-quarter of 
the cars showed full compliance with the law. 
Later observations were made with two ques- 
tions in mind. One was practical: Under 
what conditions, in this neighborhood, would 
drivers obey the law more fully? The other 
was theoretical: What light could be cast, by 
further research along these lines, on Allport’s 
theory of the J curve as representative of 
conformity behavior (1)? 


Procedure 


For all observations, E stationed himself in a shel- 
tered doorway where he would have a full view of 
the intersection, but would not be visible to the 
drivers of approaching cars. The observation time 
in each case was one hour. All cars which ap- 
proached the intersection from the side street were 
placed in one of the five following categories: 

1. Car came to a full stop with not more than 
one-half of its length past the line between the two 
corners. 

2. Car came to a full stop with more than one- 
half of its length past the line between the two 
corners. 

3. Car slowed down at the intersection but did 
not come to a full stop. 

4. Car did not slow down perceptibly. 


5. Car followed another car so closely that the 
driver was obliged to slow down or stop to avoid a 
collision. 

Cars in category 5 were excluded from later con- 
sideration. Whenever possible, other observations, 
such as the presence of a pedestrian, a policeman or 
police car, or traffic on the cross street, the condi- 
tion of the automobile, license plates indicating that 
the car was not a local one, and a description of the 
driver, were recorded. 

The original observations were made in two pe- 
riods, each lasting one hour. Further data were col- 
lected, for two hours, at the same corners three 
months later, and for one hour at a similar crossing 
where there was a red blinker light. (State law re- 
quires that in the presence of a red blinker light, 
just as with a stop sign, the driver should bring the 
car to a full stop before any part of it has crossed 
the intersection.) 


Results and Discussion 


The data are shown in Table 1 and in 
Fig. 1. 

Practical considerations. We may postu- 
late that two factors are involved in stopping 
at a corner where there is a blinker or sToP 
sign (assuming that the motorist has conirol 
of his car): one involves the intention to con- 
form with the law, and the other involves 
seeing the sign in time to obey it. Presum- 
ably where the car comes to a full stop past 
the intersection, the second factor is lacking. 
Comparing the first two categories for the 
STOP sign and the blinker, we find that virtu- 


“ally the same proportion of drivers (one- 


half) came to a full stop in both cases, but 
that a significantly larger proportion of the 
drivers brought their cars to a full stop in 
time when there was a blinker (chi square 
gives p< .01). This indicates that the 
greater visibility of the blinker is an advan- 
tage to the law-abiding driver, but that the 
moral effect of the blinker, in this population, 
is little or not at all greater than the moral 
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Table 1 
Stops and Slowdowns of Cars at Intersections with stop Signs and with Blinker Lights 











stop Sign— 

3-4 Weeks 
After Erection 
Category of —————————_ — 
Response No. % No. 


stop Sign— 
3-4 Months 
After Erection 


Total, 
stop Signs 


Red 
Blinker 


% No. % No. q% 





Full stop with no 

more than half 

the car past the 

intersection line 
Full stop with more 

than half the car 

past the intersec- 

tion line 51 33 
Slowing down, but 

no full stop 53 62 
No perceptible slow- 

ing down 28 17 31 


Total 170 100 178 


19 
35 


17 21 
100 94 





effect of the stop sign. Since, for both 
blinker and stop sign, about one-fifth of the 
drivers passed the intersection without per- 
ceptible slowing down, it would seem that 
neither provided a reasonable safeguard 
against traffic accidents. 

Comparison of results when the stop sign 
was new and when it had been in place for 
several months shows only small differences, 
which were not significant at the .05 level of 
confidence. This indicates that neither nov- 
elty nor familiarity influenced the drivers, or 
that the effects were too small to be ascer- 


Per cent of automobiles 








Full stop Full stop Slowing No change 

(prompt) (too late) down of speed 
Fic. 1. Conformity to traffic ordinances at cor- 
ners marked with stop signs and with red blinker 
lights, using four categories to measure conformity. 


tained, or else that the two effects largely 
cancelled each other. 

From on-the-spot notes, it was evident that 
the presence of a policeman on the corner re- 
sulted in greater compliance with the law, 
and that when traffic on the cross-street was 
heavier, cars were more likely to come to a 
full stop. No differences were observed be- 
tween out-of-state cars and New York cars, 
between New York cars which carried local 
license plates and those which did not, be- 
tween old cars and new cars, or between cars 
in good external condition and those which 
looked shabby. 

Theoretical considerations. According to 
Allport’s well-known theory of conforming 
behavior, we might have expected that (a) 
50 per cent or more of the automobiles would 
conform to the traffic code and stop before 
crossing the intersection, thus being listed in 
our first category, and (6) the distribution 
of cases in the oiher three categories would 
show a negatively accelerated decrease, the 
resulting curve looking like a reversed letter 
J without the final upswing. A glance at 
Fig. 1 shows that this did not occur. If the 
first two categories are pooled, the modified 
curve bears a resemblance to Allport’s J for 
the drivers at the corner with the blinker, but 
not for the drivers at the corner with the stop 
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STOP sign 


“™ Red blinker 


Per cent of automobiles 








Full stop Slowing 


down 


ho change 
of speed 


Fic. 2. Conformity to traffic ordinances at cor- 
ners marked with stop signs and with red blinker 
lights, using three categories to measure conformity. 


sign (Fig. 2). However, the differences be- 
tween the two curves are not statistically sig- 
nificant. Not even for the drivers at the 
corner with the blinker, moreover, do we find 
so sharp a decline as in the classical J. 

It is tempting to speculate that the reason 
for the discrepancy between the results antici- 
pated from Allport’s theory and the results 
obtained is not that the theory is unsound, 
but rather that the situation is too complex 
to be stated so simply. We shall suggest a 
single complicating factor, although there are 
probably many more which could be brought 
out by careful analysis. 

Perhaps in New York City, where these ob- 
servations were made, one norm to which 
automobile drivers tend to conform is the 
traffic code; but another norm is the cultural 
imperative of “Hurry!” The two are not 
always opposed: even New York City drivers 
know that careless driving is likely to result 
in an accident, which in turn results in long 
delays. Thus the two norms would not (to 
turn to Lewin’s vector analogy) pull in op- 


posite directions; but they would pull in dif- 
ferent ones. The solid line, and to a lesser 
extent the broken line, of Fig. 2, might rep- 
resent conformity behavior which has been 
“pulled out of shape” by the presence of a 
second, partially conflicting norm which ap- 
plies to the same behavior. 


Conclusions 


1. In the neighborhood that was studied, 
only about one-half of the drivers stopped 
their cars at intersections marked with a red 
blinker light or with a stop sign, although a 
full stop is required by law. 

2. A significantly greater percentage of the 
drivers stopped their cars too late (that is, 
past the line of intersection) at the stop sign 
than at the blinker. This is interpreted to 
mean that the blinker is more readily visible 
from a distance, and thus makes conformity 
with the law easier. 

3. There was no significant difference in 
the perceritage of drivers who stopped or 
slowed down at the stop sign or at the red 
blinker. 

4. There was no significant difference be- 
tween the percentage of drivers who stopped 
or slowed down for a newly erected sign or 
for one that had been standing for several 
months. 

5. Some of the data show a fair approxi- 
mation to Allport’s J curve, and some of them 
differ markedly from the J curve. It is sug- 
gested that the J-curve hypothesis should be 
supplemented, rather than discarded. 
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Recent years have seen the development of 
an increasing number of supervisory train- 
ing programs among industrial organizations. 
These programs deal in large measure with 
the area broadly defined as “human rela- 
tions.” Essentially, these training courses 
must be viewed as attempts to modify or 
“improve” the behavior of supervisors in 
dealing with their groups in their everyday 
working relationships. 

Until recently, however, little evidence was 
available concerning the actual outcomes of 
such training. What few evaluations had 
been made were often quite limited, dealing 
for example with trainees’ or supervisors’ im- 
pressions of the value of the training or with 


indices of supervisory attitudes immediately 


before and after the training course. In a 
recent study by Fleishman (1), an attempt 
was made to obtain a more meaningful 
evaluation of a human relations training pro- 
gram for foremen in a large industrial organi- 
zation. Supervisory training was evaluated 
in terms of descriptions of the foreman’s 
leadership behavior as well as his own lead- 
ership attitudes. Moreover, evaluations were 
made back in the actual plant situation some- 
time after the foremen had returned from 
training. 

The present paper represents an extension 
of this previous research and describes a fur- 
ther study of some additional effects of such 
training in the same motor truck manufac- 
turing plant. 


1 The study was carried out at the Personnel Re- 
search Board, The Ohio State University, with the 
cooperation of the International Harvester Com- 
pany. The opinions or conclusions contained in this 
report are those of the authors and are not to be 
construed as reflecting the views or indorsement of 
the Department of the Air Force. 


Procedure 


The problem. The previous study compared 
matched groups of foremen in the plant situation. 
One group had not been sent to training, whereas 
the remaining three groups had different amounts of 
time elapse since training. In general, the results 
showed that in terms of the measures used, the ef- 
fects of the training were minimal and certainly did 
not last when evaluated back in the plant. Except 
for the most recently trained group (which differed 
in an unexpected direction), over-all differences be- 
tween the trained and untrained foremen in leader- 
ship attitudes and behavior were not significant. 
In the present study it was possible to employ a 
“longitudinal” methodology in that measures of 
leadership attitudes and behavior were obtained on 
the same groups of foremen before and after the 
training period. The purpose here was to compare 
the results of this analysis with that obtained from 
the “cross-sectional” approach used by Fleishman 
(1). 

A second problem investigated was the effect of a 
“refresher” human relations training course on the 
behavior and attitudes of foremen who had re- 
ceived the original and more extensive training course 
some time before. 

Both of the above evaluations were made with a 
view to a third problem. The concern here was 
with the stability of leadership behavior patterns 
over a period of time for those foremen who had 
received human relations training as compared with 
those foremen who had not received such training. 
Aside from the training implications, data on the 
stability of leadership patterns in complex organi- 
zations would be of considerable interest in them- 
selves. Although the problem appears to be a cru- 
cial one, little evidence on this is currently available. 
However, such data would have direct bearing on 
the extent to which one can generalize from ob- 
servations of leadership behavior made at one point 
in time to later behavior of the same individual in 
the same general situation. Or in other terms, 
within a given organizational framework, to what 
extent can we predict future behavior of the leader 
from knowledge of present or past leadership be- 
havior? 

The research instruments. The Supervisory Be- 
havior Description (2) and the Leadership Opinion 
Questionnaire (3), whose development has been de- 
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Table 1 


Comparison of Supervisory Behavior Description Scores for Foremen With and Without 
Intervening Central School Training 








Leadership 


Group Dimension 


Before Training 


After Training 


Mean SD Mean SD 





Consideration 
Initiating 
Structure 


With intervening 
training 
(N = 39) 


Consideration 
Initiating 
Structure 


Without intervening 
training 
(N = 59) 


70.6 13.8 72.1 13.0 


43.2 6.4 " 40.8 


73.8 15.0 74.4 58 


41.5 7.1 39.6 6.1 46 





scribed in detail previously, were the primary in- 
struments used in the study. Briefly, the Super- 
visory Behavior Description contains 48 items which 
describe how supervisors operate in their leadership 
role. The questionnaire is scored on two reliable 
and factorially independent dimensions called “Con- 
sideration” and “Initiating Structure.” A high score 
on the Consideration dimension characterizes super- 
visory behavior indicative of friendship, mutual 
trust, respect, a certain warmth between the super- 
visor and his men, and consideration of their feel- 
ings. A low score on this dimension indicates the 
supervisor to be more authoritarian and impersonal 
in his relations with group members. This dimen- 
sion comes closest to reflecting the “human relations” 
aspect of group leadership. The Initiating Structure 
dimension reflects the extent to which the supervisor 
defines or facilitates group interactions toward goal 
attainment. A high score on this dimension charac- 
terizes supervisors who play a more active role in 
directing group activities through planning, com- 
municating, scheduling, criticizing, trying out new 
ideas, etc. The questionnaire is typically filled out 
by group members who mark for each item how 
frequently their own supervisor does what each item 
describes. 

The Leadership Opinion Questionnaire contains 40 
parallel items and is filled out by the supervisor him- 
self. It is scored along the same two dimensions 
and reflects the supervisor's own attitudes about 
how work groups should be led. 

Leadership patterns before and after training. In 
this phase of the study, only the Supervisory Be- 
havior Description questionnaire was involved. At 
least three workers drawn randomly from the work 
groups under each of 98 different foremen filled out 
the questionnaire describing the behavior of their 
own foremen. Similar data had already been col- 
lected on these foremen eleven months previous to 
this administration. Of the 98 foremen, 39 had 
since been sent to the company’s Central School 
which administers the Supervisory Training Pro- 
gram for foremen from all its diverse plants.2 The 


2 The program involves eight hours a day for two 
weeks of intensive training. Group discussion, lec- 


remaining 59 had received no such training during 
this same period. No selective factors could be 
found which determined the order in which foremen 
were sent to this training, and the two groups may 
be considered comparable. 


Results 


Table 1 presents a comparison of the two 
groups in terms of each of the two leader- 
ship dimensions. 

It is clear from Table 1 that no significant 
differences existed in mean scores made by 
either group before and after the training pe- 
riod. The mean scores for each group on 
each leadership dimension were not signifi- 
cantly different before training (indicating 
adequate matching), and were not significant 
after the training period. Moreover, no sig- 
nificant changes in mean scores occurred 
within either group during the 11-month in- 
terval, whether or not they had or had not 
been sent to training. 

Perhaps the most striking results in Table 1 
bear on the test-retest coefficients for the 
trained group as compared with the group 
without training during the same period. The 
correlations between administrations for the 
trained group do not reach the 5% level of 
confidence, whereas those for the group with- 
out intervening training are significant beyond 
the 1% level. This trend occurs in the case 
of both the “Consideration” and “Initiating 
Structure” dimensions. Thus, the data would 
seem to indicate that the intervention of 
training in some way affects the stability of 





ture methods, visual aids, and a variety of tech- 
niques are employed. For a description of the pro- 
gram and its workings, see Walker (6). 
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leadership patterns. Although the group 
means are not statistically significant before 
and after training, there is apparently a dif- 
ferential effect of such training on the be- 
havior of different foremen within the trained 
group. This suggestive finding will be pur- 
sued later. 

With regard to the stability of leadership 
patterns for the foremen without intervening 
training, some stability was demonstrated 
(r = .58 for Consideration and .46 for Initi- 
ating Structure). However, these correlations 
should not be interpreted as reflecting the 
upper limit of the stability of leadership 
behavior over this particular time interval. 
Undoubtedly, some of the variance is at- 
tributable to, among other things, the use of 
different workers to describe the same fore- 
men in each of the two administrations. 
Some idea of the effects of using different 
workers can be gained from estimates of in- 
terrater agreement in describing the same 
foremen. 

Table 2 presents the coefficients of inter- 
rater agreement estimated by the method de- 
scribed by Horst (4). This was done sepa- 
rately for the trained and untrained groups 
of foremen. 

These coefficients, although indicative of 
significant agreement among respondents, are 
low enough to support the idea that ‘the use 
of different samples had the effect of lower- 
ing the “test-retest” coefficients. Further- 
more, one would not expect the “test-retest” 
coefficients obtained to be any higher than 
the interrater agreement. Table 2 also indi- 
cates that the interrater agreement coeffi- 
cients for workers who described trained fore- 


Table 2 


Interrater Agreement Among Workers Descriving 
the Same Foremen on the Supervisory 
Behavior Description 








Coefficient of Agreement 





For Foremen For Foremen 
Without With 
Intervening Intervening 
Training Training 
Consideration 55 aa 
Initiating Structure .50 48 


Leadership 
Dimension 








men are comparable to those who described 
the untrained foremen. Hence, there is noth- 
ing in these results to minimize the impor- 
tance of the finding that the stability of 
leadership patterns is lower for the trained 
foremen than for the untrained foremen. 


The Effects of a Refresher Training Course 


In addition to sending foremen to the com- 
pany’s Central School, this particular plant 
subsequently organized a refresher leadership 
course for the foremen who had already at- 
tended the original and more extensive cen- 
tralized program. This refresher course was 
organized in conjunction with a local col- 
lege. Since the purpose of this program was 
to “reinforce” the material presented in the 
Central School, the courses were organized 
around the same subject matter, although in 
somewhat shorter form. The refresher course 
lasted one week and used the lecture-discus- 
sion approach, with classes averaging around 
25 foremen. It was decided to see if attend- 
ing the refresher course resulted in any meas- 
urable change in leadership behavior or atti- 
tudes back in the plant. Although over-all 
results from the original course appeared to 
be minimal, the additional course might still 
have produced some effects. This seemed 
possible, especially in view of the apparent 
differential effects found in the stability 
analysis described above and from the fact 
that this “refresher” training was given closer 
to the actual work situation. 


Procedure 


Two groups of 31 foremen were selected, each of 
whom had been to the Central School in Chicago. 
Measures of the leadership attitudes and behavior 
of each of the foremen had already been obtained. 
After these measures were obtained, one of the 
groups (experimental) attended the refresher train- 
ing course, while the other group (control) did not. 
Information about attitudes and behavior after this 
course was obtained from a readministration of the 
same questionnaires. The average time since attend- 
ing the second course was 3.2 months (standard 
deviation, 1.6 months). 

The groups were matched on the mean scores 
achieved in the initial administration of the Super- 
visory Behavior Description (filled out by workers) 
and the Leadership Opinion Questionnaire (filled out 
by the foremen), as well as on the variables of age, 
education, years as a supervisor, seniority, number 
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Table 3 


Comparison of Behavior and Attitude Scores of Foremen With (Experimental Group, NV = 31) and 
Without (Control Group, N = 31) Refresher Training 








Leadership 


Questionnaire Dimension 


Before Refresher 
Training 


After Refresher 
Training 








Mean SD Mean SD 





Consideration 

Supervisory 
Behavior 
Description Initiating 

Structure 


Consideration 
Leadership 
Opinion 
Questionnaire Initiating 
Structure 


73.0 12.7 
75.5 13.2 


70.9 12.7 
72.6 


42.8 
40.3 


40.7 
37.5 
54.1 
56.0 


54.3 


52.9 
52.6 





of men supervised, and months since attending the 
Central School. Differences between these groups 
were also checked in terms of the leadership of the 
foremen’s own bosses. This was done since Fleish- 
man (1) had found previously that the “leadership 
climate” under which the foremen themselves op- 
erate is a potent variable related to the foreman’s 
own leadership attitudes and behavior. Since any 
change in foreman behavior or attitudes might pos- 
sibly be a function of a corresponding change in 
their bosses’ behavior or expectations, it was impor- 
tant that this be checked in both the pre- and post- 
training administrations. Suffice it to say here that 
no differences were found between the two groups 
at either administration for these “climate” vari- 
ables.? 


Results 


Table 3 summarizes the mean behavior 
and attitude scores obtained before and after 
the refresher training for the experimental 
(trained) and control (untrained) groups. 

It can be seen from Table 3 that no strik- 
ing differences appear in the data obtained 
from the experimental group relative to that 
obtained from the control group. The data 
in this table were evaluated in several ways. 
The first evaluation compared the mean be- 
havior and attitude scores made by the con- 
trol and experimental groups after training. 
In this comparison, none of the differences 
on either leadership dimension were signifi- 
cant at the 5% level of confidence. A second 
method of evaluation analyzed the data to see 


8 For a discussion of the measures used to evalu- 
ate “leadership climate,” see Fleishman (2, 3). 


coy 


if there were significant changes within the 
refresher group relative to that of the con- 
trol group. In this analysis, one statistically 
significant change was noted. This, however, 
was a significant drop in the mean “struc- 
ture” behavior score for those foremen com- 
prising the control rather than the experi- 
mental group. None of the other critical 
ratios approached statistical significance. 

A third and more rigorous method of analy- 
sis employed a statistical regression tech- 
nique. In this method, the mean, sigmas, 
and correlation between the first and second 
administration for the control group are used 
to establish a regression equation which pre- 
dicts the second score on the basis of the first 
score. This equation is then used to predict 
the second score for each of the members of 
the experimental group, given their first score. 
This provides the best prediction of the 
second score if the intervening training has 
had no effect. The differences between the 
predicted and obtained scores are summed 
algebraically and treated for significance. A 
significant difference implies that the intro- 
duction of training has resulted in a signifi- 
cant change in the scores of the experimental 
group.* The results of this analysis in the 

*This method does not assume that the initial 
scores of each group are the same and makes allow- 
ances for this, and thus obviates the necessity of 


having perfectly matched groups. See Peters and 
Van Voorhis (5). ° 
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present instance showed none of the differ- 
ences to be statistically significant. 

The results of these evaluations of the re- 
fresher course may be viewed as largely nega- 
tive in terms of changes in mean behavior and 
attitude scores resulting from training. How- 
ever, one other set of results qualifies a com- 
pletely negative conclusion. This is the cor- 
relation between scores made by the foremen 
after the course with scores they made before 
training. Table 4 summarizes these. 

We find in Table 4 that those foremen who 
have had the refresher training showed con- 
siderably less pre- and postscore agreement 
than did those foremen who had not attended. 
This is true for both the foreman’s leadership 
attitudes and behavior along both the Con- 
sideration and Initiating Structure dimensions. 
Confidence in these results is increased when 
we recall that our estimates of behavior and 
attitudes were derived independently (from 
reports of workers and foremen, respectively) . 
Thus, these results indicate greater stability 
of leadership patterns for those foremen who 
did not have this intervening refresher train- 
ing. 

This result confirms the earlier results ob- 


tained when we compared the stability of 
leadership patterns for foremen who had been 
to the original Central School with foremen 
who had not had this training. These sta- 
bility coefficients were .58 (for Consideration 
behavior) and .46 (for Initiating Structure 


Table 4 
Coefficients of Leadership Stability 








Foremen 
With 
Inter- 
vening 
Refresher 
Trainin 
(N=31 


Foremen 
Without 
Inter- 
vening 
Refresher 
Leadership Trainin 


Questionnaire Dimension (N=31 





Consideration .56 
Initiating 
Structure 53 


Supervisory 
Behavior 
Description 


Leadership 
Opinion 


Questionnaire 


Consideration 80 
Initiating 
Structure 





behavior) for the untrained group. For the 
group with intervening training, these shrink 
to .27 and .22 for Consideration and Initiat- 
ing Structure, respectively. Again, as in the 
case of the refresher cotirse, the lowest agree- 
ment between scores was in the group which 
had the intervening program. 


Summary and Conclusions 


A supervisory training program was evalu- 
ated in terms of changes in the leadership 
behavior and attitudes of the trainees back 
in the work situation. Scores made on ques- 
tionnaires administered before training were 
compared with scores obtained after training 
for an experimental group (with intervening 
training) and a control group (without inter- 
vening training). The questionnaires em- 
ployed were the Supervisory Behavior De- 
scription (worker descriptions of foreman 
behavior) and the Leadership Opinion Ques- 
tionnaire (foreman’s own leadership atti- 
tudes). Each questionnaire yields a score on 
two reliable and factorially independent di- 
mensions called Consideration and Initiating 
Structure. The same general methodology 
was used to evaluate an original supervisory 
training course and also a refresher training 
course. 

The results generally confirm previous find- 
ings in the same plant (1) that in terms of 
mean scores before and after training, the ef- 
fects of such training appear minimal when 
evaluated back in the plant. This was true, 
in the present instance, for both the original 
and refresher training courses. 

However, other findings in the present 
study must qualify any completely negative 
conclusion regarding the effectiveness of the 
training. These findings bear on the stability 
of leadership patterns for individual foremen. 
It was found, for example, that relatively con- 
sistent patterns of leadership behavior and 
attitudes existed over time for foremen who 
had not been sent to training. This was 
indicated by test-retest correlations between 
questionnaire administrations for the control 
groups. However, for foremen who had in- 
tervening training, a much lower coefficient 
of agreement was found between question- 





Human Relations Training and Leadership Patterns 25 


naire administrations. This was found for the 
intervening refresher course as well as for the 
original course, and was found for leadership 
behavior as well as attitudes. 

These results are consistent with the previ- 
ous finding in the training situation itself that 
wide individual differences exist among fore- 
men in the leadership attitudes they hold 
after training (1). Moreover, large indi- 
vidual shifts in scores occur in both direc- 
tions. From the point of view of training 
evaluation research, one cannot assume that 
insignificant changes in group means among 
trained foremen are indicative of no training 
effects. The problem appears more compli- 
cated than that. It raises the possibility of 
differential effects according to the individual 
and the situation in which he finds himself. 
Future training research might well be di- 
rected toward finding the personal and situa- 


tional variables which interact with the ef- 
fects of such training. 


Received February 1, 1954. 
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Counselors seldom complain about a short- 
age of clients. Rather, appointment books 
are usually filled for a week, two weeks, or 
even a month in advance. Probably this 
state of affairs underestimates the demand, 
and if more time could be found, the num- 
ber of applicants for counseling would in- 
crease rapidly. 

But a realistic look at the situation reveals 
that hopes of alleviating this pressure through 
an increased staff are small because of both 
the lack of trained counselors and necessary 
budget restrictions. One solution that sug- 
gests itself is dealing with clients in groups. 

Stone (7) and Richardson and Borow (6) 
have demonstrated the usefulness of group 
procedures in preparing clients for individual 
vocational counseling. Although a few ex- 
periential and theoretical reports have ap- 
peared in the literature (e.g., 1, 4), little ex- 
perimentation with the use of groups in voca- 
tional and educational planning has been 
done. 

A recent report of such an experiment (2) 
showed that high school seniors who partici- 
pated in a group vocational guidance pro- 
grain made just as realistic vocational choices 
as did those who had individual counseling. 
Unfortunately, a control group was not pro- 
vided, so that these results could be at- 
tributed to an equal ineffectiveness of both 
procedures. An unpublished study by Nis- 
senson (5) investigated the effects of a group 
vocational guidance program with high school 
boys. The study was well controlled, and 
the major findings indicated that the program 
was effective in fulfilling the following objec- 
tives: (a) an increased awareness on the part 
of the subjects (Ss) of the need for help in 
career planning; () an increased use of in- 
dividual counseling services; (c) an increased 
reading of occupational literature; and (d) 


1Now at the Counseling Center, Kansas State 
College. 


an increased understanding of the occupa- 
tions tentatively chosen. 

Such studies provide us with justifications 
for looking further into the matter of working 
in groups with vocationally undecided stu- 
dents. The present study is a step in that 
direction. 


Objectives of the Group Program 


A committee of the University of Minne- 
sota Student Counseling Bureau counselors * 
met to consider the question of objectives for 
a group vocational guidance program. It was 
decided that if we hope to make use of groups 
as economical substitutes for individual vo- 
cational counseling, the objectives should be 
similar in both types of programs. Four 
objectives were designated: These were to 
increase: (a) satisfaction with vocational 
choice, (&) certainty of vocational choice, 
(c) realism of vocational choice, and (d) 
the appropriateness of certainty in terms of 
realism. 

The latter objective was included because 
it was felt that if an individual merely be- 
came more certain of an unrealistic choice 
after counseling, the counseling was not effec- 
tive. Therefore, we felt that (a) for realistic 
choices, certainty should increase, and (5) 
for unrealistic choices, certainty should .de- 
crease. 

Hypotheses 


The major hypothesis of the study was 
that participation in a group vocational guid- 
ance program is associated with achievement 
of the four objectives listed above. 

A second hypothesis was that participation 
in an individual vocational counseling pro- 
gram is associated with achievement of the 
four objectives. 

2 The author wishes to express his appreciation to 
Dr. Ralph Berdie and the entire Student Counseling 


Bureau staff for their cooperation in planning and 
carrying out this research. 
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Thirdly, it was hypothesized that students 
participating in groups would not differ from 
students participating in individual counsel- 
ing in the achievement of the four objectives. 


Method 


Sample. Names of freshman men who had indi- 
cated that they were undecided about their voca- 
tional plans at the beginning of the school year were 
obtained from a student information sheet collected 
by the Office of the Dean of Students. These stu- 
dents were invited to participate in the group pro- 
gram which had been planned. Students who were 
interested returned a post card on which they indi- 
cated what their tentative vocational choice was, 
how certain they felt of it, and how satisfied they 
were with this choice. A total of 191 returns were 
obtained. 

Those students who were unable to attend at the 
scheduled times were eliminated from the sample 
and invited to make individual counseling appoint- 
ments. In addition, records in the Student Counsel- 
ing Bureau were checked, and students who had been 
counseled there earlier in the year or who had in- 
complete test records were also eliminated from the 
sample. The remaining 60 students composed the 
sample. 

Procedure. These 60 students were divided into 
three groups: 30 Group Experimentals, who were to 
participate in the groups; 15 Individual Experimea- 
tals) who were to receive the typical individual vo- 
cational counseling provided at the Student Coun- 
seling Bureau; and 15 Controls, who were to par- 
ticipate in neither the group nor individual program 
during the experimental period. Assignments to 
these groups were made by a random method. 

The Individual Experimentals and the Controls 
were individually contacted by telephone, and it 
was explained that the requests to participate in the 
group program had exceeded our expectations and 
that the groups were already full. Individual Ex- 
perimentals were invited to take advantage of the 
individual counseling facilities of the Counseling Bu- 
reau and appointments were made for them. An 
apology was offered to the Controls, and a promise 
to include them in the next quarter’s program was 
made (and subsequently kept). 

The experimental treatment for the Individual Ex- 
perimentals consisted of the typical counseling meth- 
ods of the Student Counseling Bureau (8). Four- 
teen of the 15 students kept the first appointment, 
and the average number of appointments was 2.6. 
Thus, about 39 counseling hours were used on this 
group. 

For the Group Experimentals, the procedure was 
as follows: The students first attended an introduc- 


3 The one student who failed to keep his appoint- 
ment was retained in the experimental sample in 
order to keep motivation for counseling equal for 
contro] and experimental groups. 


tory lecture of about 30 minutes in which the gen- 
eral problem of choosing a vocation was discussed. 
Brief mention was made of the importance of a 
knowledge of one’s own abilities, interests, values, 
and personality characteristics, as well as a knowl- 
edge of jobs. Pitfalls in self-analysis were pointed 
out, and the values of group discussion and of the 
give and take within a group were stressed. 

Then small discussion groups of from five to seven 
students were formed, and individual counselors 
acted as group leaders. Each student was provided 
with a copy of his Strong Vocational Interest test 
results, and frequently this provided the stimulus 
material for discussion. Just as frequently a short 
questionnaire the students had filled out prior to 
the group lecture stimulated the discussion. After 
about 30 minutes the students were told that the 
time for the meeting had ended, and arrangements 
were made for further group meetings for those who 
were interested. 

Twelve groups were formed. A total of 79 stu- 
dents participated in these groups. Fifty-four of 
these students were not included in the experiment 
for one or more of the reasons cited earlier; the 
other 25 were originally selected as Group Experi- 
mentals.4¢ The 12 groups met an average of 2.3 
times after the first meeting with a range of from 
one to six hours, so that about 34 hours of counselor 
time were involved in dealing with these 79 students. 

Six weeks after the last meeting of any group, all 
60 experimental Ss were again contacted and asked 
to fill out another card indicating their current vo- 
cational choice, how certain they were of it, and how 
satisfied they were with it. A 100 per cent return 
was obtained. In order to avoid as much as pos- 
sible the “hello-goodbye” effect (3), the students 
were told that this was a study of the development 
of vocational choices during the freshman year. No 
mention was made of the experiment except that the 
students were reminded that information concerning 
their vocational choices had been collected earlier 
for a different purpose. 

Measuring instruments. Changes in certainty and 
satisfaction could readily be analyzed since the stu- 
dent himself provided a quantitative rating on an 
11-point scale. Realism of vocational choice was not 
so easily measured. The clinical judgment of experi- 
enced vocational counselors was used.5 The data 
provided to these judges were as follows: (a) scores 
on the Strong Vocational Interest test; (b) scores on 
the 1947 edition of the ACE psychological examina- 
tion; (c) the student’s high school rank; (d) scores 
on the Cooperative English test, Form S; (e) grades 
earned during the first year in college; (f) special 


4 Five students originally selected as Group Ex- 
perimentals attended no meetings, but they were in- 
cluded as part of the sample in order to keep moti- 
vation for counseling constant for control and ex- 
perimental groups. 

5 Special thanks are due to Miss Alice Christian, 
Dr. Theda Hagenah, Dr.. Vivian Hewer, Mr. James 
Lyon, Dr. Mabel Powers, and Dr. Cornelia Williams 
for contributing their time to this part of the project. 
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aptitude tests, including the General Aptitude Test 
Battery for 21 students, the Ohio Psychological for 
24 students, and engineering aptitude tests for 13 
students; (g) a biographical data sheet, which in- 
cluded a summary of work experiences, hobbies, 
family background, and extracurricular participa- 
tions; and (hk) the two vocational choices made by 
the student. Naturally the judges did not know if 
they were judging an experimental or a control S; 
neither did they know which choice represented the 
pretest and which represented the posttest. 

A realistic choice was defined as one in which the 
probability was high that the student could com- 
plete the necessary training, could find employment 
in the chosen job, and would succeed and remain in 
this work over a period of years. The judges were 
asked to rate each choice as to whether or not it 
was realistic. Then they were asked to indicate how 
certain they were of this rating on an 11-point cer- 
tainty scale. If a choice was rated as “unrealistic” 
and the counselor making the rating was “very cer- 
tain,” the choice was given a score of — 11. If the 
choice was rated as “realistic” and the judge was 
“very certain,” the choice was given a score of 
+11. Two judges rated every choice, and the score 
for a given choice was the sum of the two ratings. 
The agreement between the judges was reasonably 
high (r= .72). Of the 85 choices rated, the judges 
disagreed as to whether or not a choice was realistic 
15 times, or 17.6 per cent of the time. In cases of 
disagreement, a choice was called realistic if the sum 
of the ratings was positive, and unrealistic if the 
sum of the ratings was negative. 

Appropriateness of certainty in terms of realism 
was measured as follows: If the choice was origi- 
nally realistic and was changed to an unrealistic 
choice, or if both choices were unrealistic, any in- 
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crease in certainty was scored negatively, and any 
decrease was scored positively. If the choice was 
originally unrealistic and was changed to a realistic 
one, or if both choices were realistic, increases in 
certainty were scored positively and decreases were 
scored negatively. 

Six students (1 Individual Experimental, 1 Con- 
trol, and 4 Group Experimental) were unable to 
specify even a tentative vocational choice at ihe be- 
ginning of the experiment. They were arbitrarily 
given a rating of 1 on certainty and satisfaction, 
but were not included in the analysis of the other 
two variables. 


Results 


Table 1 summarized the changes found on 
the four variables. It is apparent from the 
table that mean scores on all four variables 
increase for both the Individual and Group 
Experimentals. There were slight decreases 
in the mean certainty, satisfaction, and real- 
ism scores for the Controls, and a slight in- 
crease in the mean score on appropriateness 
of certainty in terms of realism. 

The significance of these trends was tested 
by the analysis of covariance. Posttest scores 
for each experimental group were compared 
with posttest scores for the Controls, with 
pretest scores held constant. 

When the Group Experimentals are com- 
pared with the Controls, differences signifi- 
cant beyond the .01 level are found on both 
certainty and satisfaction. The difference on 


Table 1 


Means and Standard Deviations of Pre- and Posttest Scores on Four Criteria of 
Success in Vocational Guidance 








Control 
(N = 15) 





Criteria Pretest 


Posttest 


Individual 
Experimentals 
N = 15) 


Group 
Experimentals 
N = 30) 


Pretest Posttest 





Pretest Posttest 





Certainty M 
SD 


5.40 
2.3 


6.93 
2.6 


17.50 
16.0 


8.71 
6.1 


5.33 
2.4 


Satisfaction 6.13 


1.9 


16.36 
15.8 


9.43 
6.1 


SD 


Realism* 
SD 


Appropriateness of M 
certainty in terms SD 
of realism* 


5.80 
2.1 


7.60 
1.8 


4.93 6.50 
2.7 2.0 
6.67 
2.4 


8.20 
1.5 


6.70 47 
2.9 


19.57 
15.7 


8.86 
6.5 


24.07 
14.4 


16.23 
13.8 


8.92 
6.0 


10.79 
8.0 





* The N's for these variables were 14, 14, and 26, respectively (see text). 
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Table 2 


Number and Percentage of Students Making Realistic 
Vocational Choice Before and After 
Experimental Period 





Before 


Group 


Individual 
experimentals 
(N=14) 

Group 
experimentals 

(N = 26) 


realism was significant between the .05 and 
.01 level. On appropriateness of certainty in 
terms of realism no significant difference was 
found (p > .10). These data as a whole cor- 
roborate the first hypothesis. 

Similar results were obtained in comparing 
the Individual Experimentals with the Con- 
trols. Thus, differences favoring the experi- 
mental group were significant beyond the .01 


level on both certainty and satisfaction. The 
difference on realism was significant between 
the .10 and .05 level of probability. Again, 
no significant difference existed on appro- 
priateness of certainty in terms of realism 


(p > .10). The second hypothesis appears 
to be generally corroborated. 

When the two groups of experimental Ss 
are compared with each other, no significant 
differences were found. Thus, the final hy- 
pothesis is also corroborated. 

Changes on the realism criterion are pre- 
sented in a different way in Table 2. Here 
the number of Ss making realistic and un- 
realistic choices before and after the experi- 
mental period is presented. For the Control 
group, the percentage of realistic choices is 
35.7 on both occasions. This figure is 42.9 
before counseling and 57.1 after counseling 
for the Individual Experimentals. The Group 
Experimentals show a change in the percent- 
age of realistic choices from 34.6 to 57.7. 
This difference, while not statistically signifi- 
cant, is certainly in the expected direction. 


Discussion 


The findings reported above indicate that 
vocational guidance by either the individual 
or group method is effective in producing 
positive changes on relevant criteria. More 
important, however, is the finding that there 
were no differences between the two methods 
of dealing with vocationally undecided stu- 
dents. According to the criteria adopted for 
this study, as effective work was accomplished 
using group procedures as by the more tra- 
ditional individual approach. 

In view of the amount of time required for 
the group approach as contrasted with the in- 
dividual approach, this research would seem 
to provide a strong endorsement for group 
programs in vocational guidance. 

Certain limiting restrictions on this gen- 
eralization are required. It should be re- 
membered that the experimental population 
was artificially restricted by limitations on 
college class, sex, motivation for group par- 
ticipation, lack of previous counseling con- 
tacts, and the completeness of psychometric 
records. As a consequence, the conclusions 
apply strictly to only a very limited segment 
of the total potential population. A second 
limitation is that our results apply only to 
those programs in vocational guidance where 
the objectives are similar to our objectives. 
To generalize to other types of guidance pro- 
grams, or to all counseling, is certainly un- 
warranted. 

However, the results are encouraging 
enough to suggest further studies of the 
group approach in more generalized popu- 
lations and in different types of counseling 
programs. 


Summary and Conclusions 


The heavy demand for professional serv- 
ices to vocationally undecided students makes 
the search for more economical ways of deal- 
ing with such students imperative. This 
study investigates the effectiveness of a group 
method in vocational guidance, and compares 
its effectiveness with that of individual coun- 
seling. 

Sixty freshman males who volunteered to 
participate in a group vocational guidance 
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program composed the sample. Each student 
was assigned, by a random method, to one of 
three groups: (a) Individual Experimentals 
who received individual counseling; (0) 
Group Experimentals who participated in the 
group program; or (c) Controls who were 
not exposed to either treatment until after 
the experimental period. 

All 60 Ss indicated their tentative voca- 
tional choices, how certain they were of them, 
and how satisfied they were with them both 
before and after the experimental period. 
Skilled counselors rated each choice as to its 
realism. 

Using the analysis of covariance, the re- 
sults led to the following conclusions: 


1. With original status held constant, the 
Group Experimentals are significantly more 
certain of their vocational choices (p < .01), 
more satisfied with these choices (p < .01), 
and more realistic in them (p < .05) than 
the Controls. 

2. With original scores held constant, the 
Individual Experimentals are significantly 
more certain of their vocational choices 
(p < .01), more satisfied with these choices 
(p < .01), and probably more realistic in 
them (p < .10) than the Controls. 

3. No differences were found between the 
effectiveness of the individual counseling pro- 
gram and the group program. 

4. No differences were found in appropri- 
ateness of certainty in terms of realism, prob- 
ably because the Controls appropriately be- 


came less certain, and the Experimentals ap- 
propriately became more certain. 

5. The time-saving quality of the group 
program, together with its demonstrated ef- 
fectiveness, argues for the institution of group 
programs in vocational guidance. 

6. A strong recommendation to this effect 
cannot be made until further studies have de- 
fined the limits of the generalization. 


Received March 15, 1954. 
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It is probably quite true that a word has 
no unique meaning, or, more properly, that 
the meaning of a word depends upon the con- 
text in which it is presented. In the latter 
sense, a word has an infinite number of mean- 
ings, each corresponding to a particular con- 
text. If such is the case, it is not possible to 
determine, either logically or experimentally, 
the generalized meaning of a word. However, 
it may be possible to present words in a 
particular context, and to determine their 
meanings in terms of that imposed context. 
The present study exemplifies this approach. 
For the purpose of communication, we shall 
use the term “meaning,” but only in this re- 
stricted sense. 

In addition to problems of semantics which 
motivate this study, the research was designed 
to answer specific practical questions. The 
successive-interval type of preference sched- 
ule has been demonstrated to be suitable for 
assessing preferences of subjects for various 
classes of consumer items. In the construc- 
tion of a successive-interval schedule, it is de- 
sirable that intervals be defined by particular 
descriptive phrases. Such “anchoring” of the 
intervals has been shown to increase the re- 
liability of preference responses, and to in- 
crease the amount of information, in the tech- 
nical sense, transmitted by responses (1, 2). 
The phrases should be suitably spaced along 
a continuum of meaning from phrases denot- 
ing the greatest degree of dislike for a con- 
sumer item to phrases denoting the greatest 

1This paper reports research sponsored by the 
Quartermaster Food and Container Institute for the 
Armed Forces, and has been assigned number 468 
in the series of papers approved for publication. 
The views or conclusions contained in this report 
are those of the author and are not to be construed 


as necessarily reflecting the views or indorsement of 
the Department of Defense. 


degree of like for an item. Further, phrases 
should be selected which are minimally am- 
biguous for the population of subjects studied. 
The objectives of the present study include 
the determination of phrases which meet these 
criteria, and which are then suitable as anchor 
points on preference schedules. The phrases 
investigated were selected to be typical 
phrases used to describe items of food. It 
was conjectured that the results might be 
generalized to the extent that the phrases 
useful for defining successive intervals on a 
food preference schedule might also be use- 
ful for defining intervals on schedules assess- 
ing preferences for other consumer goods. 


Procedure 


A total of 51 descriptive words and phrases was 
selected for the investigation. The words and phrases 
were themselves treated as items on a nine-interval 
questionnaire. Respondents were instructed to place 
a check mark in one of nine intervals for each de- 
scriptive phrase; three anchor points defined the set 
of response intervals: “greatest dislike,” at the left, 
“neither like nor dislike,” in the center, and “greatest 
like,” at the right. 

Questionnaires were administered to 905 enlisted 
personnel stationed at Fort Lee, Virginia. Subjects 
were stratified on the basis of educational level, for 
this was considered the variable most likely to be 
related to performance on the word meaning ques- 
tionnaire. Information gained from a nation-wide 
random sample of enlisted men in the continental 
Army, including over 7,500 respondents, was used as 
the basis for stratification. Since the character of 
instructions to respondents is an important deter- 
minant of results to a questionnaire of this kind, 
the instructions are reproduced as Table 1. 

For the purpose of analysis, it was essential that 
respondents be rejected who performed inconsist- 
ently, did not understand the task, or were not at- 
tending to the task. Rejection criteria were three- 
fold: (a) respondents who gave answers to fewer 
than one-half the items, (b) respondents whose an- 
swers followed systematic patterns (e.g., all check 
marks in one column, check marks in only two 
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Table 1 
Instructions to Respondents 
WORD MEANING TEST 


In this test are words and phrases that people use to show like or dislike for food. For each word or phrase 


make a check mark to show what the word or phrase means to you. 


Example I 


Suppose you heard a man in the mess hall say that he “barely liked” creamed corn. 


decide that he likes it only a little. 
under +1 on the scale below. 


Greatest 
Dislike 


—4 —3 


Look at the examples. 


You would probably 


To show the meaning of the phrase “barely like,” you would probably check 


Neither 
Like Nor Greatest 
Dislike Like 


—2 --!I 0 +2 +3 +4 





Barely like 























Example II 


If you heard someone say he had the “greatest possible dislike” for a certain food, you would probably check 
under —4, as shown on the scale below. 


Greatest 
Dislike 


—4 


Neither 
Like Nor Greatest 
Dislike Like 


ong 0 ih. td. . 7, es 





Greatest possible dislike / 








{no Oye 




















For each phrase on the following pages, check along the scale to show how much like or dislike the phrase 


means. 


columns with a consistent design, etc.), (c) respond- 
ents who gave inconsistent responses as shown by 
an analysis of responses to pairs of control items. 
Criterion c was based on the fact that for each pair 
of control items a respondent attending to the task 
would not place the second item higher, on the 
scale of meaning, than the first. For example, a 
respondent behaving consistently would not rate 
“strongly dislike” higher (denoting greater like) than 
“like intensely,” nor would he rate “dislike very 
much” higher than “welcome.” There were 18 con- 
trol pairs; any subject showing inconsistency on 5 
or more was rejected for purposes of further analy- 
sis. Responses to a control pair which implied that 
the words were equal in meaning were not scored as 
inconsistent responses; inconsistencies were always 
reversals in direction of the expected difference. 
The questionnaires of 71 subjects were rejected. Of 
these, 10 were rejected under criterion a, 7 were re- 
jected under criterion 6b, the remaining 54 were re- 
jected under criterion c. 

A psychophysical method for scaling successive in- 
tervals, proposed by Edwards (3), was modified so 
as to be suitable for this study. The analysis pro- 
vides the determination of a psychological con- 
tinuum, in this case a continuum of meaning, which 


exhibits characteristics of an equal-interval scale. 
Thus, distances between words or phrases on this 
continuum may be compared meaningfully. For the 
purpose of the present study, the method has been 
modified so that ambiguity, or dispersion of mean- 
ing for the words and phrases, may be determined. 
The method is derived on the basis of an assump- 
tion that, for our group of respondents, each word 
or phrase has a modal contextual meaning and that 
the meaning attributed to each word by individual 
respondents is distributed normally about that modal 
value. The method provides a check of the assump- 
tion. In the case of word meanings the normality 
assumption appears to be reasonabie on rational 
grounds, for words do have meaning determined by 
the group or the society. Any differences between 
the meaning attributed to a word by an individual 
and that attributed to the word by the group, or 
the modal individual, might be considered a random 
error. 

The scaling method employed demands that the 
frequency of response be tabulated for each item, 
for each of the nine successive intervals. This fre- 
quency is then transformed to a proportion of the 
total number of responses. The proportion is fur- 
ther transformed into a normal deviate by use of 
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tables of the normal distribution. It is from the 
normal deviates, for all 51 items, that we are able 
to derive the continuum of meaning. Once this 
scaie is established, it becomes a simple process to 
determine the position of each word or phrase on 
the scale. It is also possible to estimate the disper- 
sion, or ambiguity, of each phrase in terms of the 
established scale units. 


Results and Discussion 


In Table 2 appear the words and phrases 
investigated, listed in the order of their mean- 
ing values. The meaning value for an item 
is the scale value for that item. The scale 
value of the center of the middle interval on 
the questionnaire (labelled “Neither like nor 
dislike”) arbitrarily was defined to be zero. 
A measure of ambiguity of meaning is given 


as the standard deviation of responses for the 
item. 

For this study, scale values and dispersions 
were estimated graphically. Examples of the 
graphs appear in Fig. 1. A straight line was 
drawn through cumulative proportion points 
on normal probability paper for all propor- 
tions between .03 and .97 in a position which 
would minimize vertical discrepancies of the 
points from the line. Under the assumption 
of normality of distribution, cumulative pro- 
portions of responses, when plotted against 
cumulative interval width, should form a 
straight line. For most of the items, the re- 
sulting plots are remarkably linear. Figure 1 
shows the plots for item 8, “Preferred,” item 
32, “Loath,” and item 47, “Don’t care for 


Table 2 


Scale Values, Standard Deviations, and Discrepancies of Reproduction for All Stimulus Items 








Scale 

Value SD 
6.15 2.48 004 
4.68 2.18 003 
4.16 1.62 009 
4.05 1.59 .009 
3.71 1.01 .007 


Discrep- 


Item ancy 


*Best of all 
*Favorite 

*Like extremely 
*Like intensely 
*Excellent 


*Wonderful 3.51 97 .009 
Strongly like 2.96 ' 027 
Like very much 2.91 60 .010 
Mighty fine 2.88 d .009 
Especially good 2.86 82 .010 


Highly favorable 2.81 d 010 
Like very well 2.60 . 012 
Very good 2.56 , 014 
Like quite a bit 2.32 a .010 
Enjoy 2.21 d .010 


Preferred 1.98 ; 008 
Good 1.91 ‘ .010 
Welcome 1.77 : 012 
Tasty 1.76 : 013 
Pleasing - 1.58 d O11 


Like fairly well 1.51 59 018 
Like 1.35 77 017 
Like moderately 1.12 61 009 
OK .87 1.24 .031 
Average .86 1.08 053 





Scale 

Value SD 
Mildly like 85 47 015 
Fair 78 85 O15 
Acceptable 73 / 013 
Only fair 71 d .008 
Like slightly .69 af 014 


Discrep- 


Item ancy 


Neutral 02 ; 015 
Like not so well 30 j .031 
Like not so much Al ! .028 
Dislike slightly 59 2 .020 
Mildly dislike 74 35 .020 


Not pleasing — .83 ; 022 
Don’t care for it —1.10 j .020 
Dislike moderately —1.20 ‘ .026 
Poor —1.55 : 014 
Dislike — 1.58 ; 021 


Don’t like —1.81 ; O17 
Bad — 2.02 80 .010 
Highly unfavorable —2.16 .064 
“Strongly dislike — 2.37 53 014 
*Dislike very much —2.49 .64 014 


*Very bad 
*Terrible 
*Dislike intensely 
32 *Loath 
38 *Dislike extremely 


— 2.53 64 014 
— 3.09 .98 .006 
—3.33 1.39 014 
—3.76 3.54 .007 


—4.32 1.86 .010 
3.62 .003 


4 *Despise —6.44 





* Since more than 50% of respondents placed items in an extreme category, scale values were estimated on the basis of a linear 


extension of the scale. 
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Fic. 1. Graphical plots for three typical phrases. 


it.’ The scale values for items 8 and 47 can 
be read directly from the graph as the values, 
in scale units, of the intersections of the 
linear plots with the fiftieth percentile; these 
were determined to be 1.98 and — 1.10, re- 
spectively. For item 32, since more than 
50% of the respondents judged the item in 
an extreme category on the questionnaire, the 
linear plot was extended in the negative di- 
rection in order to obtain an estimate of 
scale value. This value was estimated at 
— 3.76. The slopes of the plotted lines are 
inversely proportional to the standard devia- 
tions of the normal distributions. For the 
three items exemplified, the standard devia- 
tions were estimated to be 1.17, 3.54, and 
.84, respectively. 

It will be noted (Table 2) that ambiguity 
associated with phrases at either extreme of 
meaning is systematically greater than am- 
biguity of words falling near the middle of 
the meaning scale. This is consistent with 
the finding that the widths of the arbitrary, 
raw successive intervals are comparatively 
small for center intervals, and larger for in- 
tervals more extreme. 

In Fig. 2 and 3 are displayed the six items 
for which the graphs show the most marked 
departures from normality. In Fig. 2 appear 
graphs for item 18, “Highly unfavorable,” 
item 39, “Dislike moderately,” and item 50, 


“Strongly like.” Items 18 and 39 show evi- 
dence of positively skewed distributions; a 
significant proportion of respondents marks 
the items as if they were on the positive, like, 
side of the scale. The reverse holds true for 
item 50, the distribution for which demon- 
strates negative skewness. For these three 
items, the assumption of normality becomes 
suspect. However, for purpose of analysis, 
the best linear plot was fitted and scale values 
and dispersions were determined as for the 
other items. 

It might be conjectured that the skewness 
associated with distributions of response to 
the three items of Fig. 2 has a common 
source: the incongruous implications of the 
adverbs with the adjectives which they 
modify. In typical social usage, the ad- 
verbs “highly” and “moderately” modify posi- 
tive, not negative, terms, while the adverb 
“strongly” is more often a negative modifier. 
Familiar terms are “highly respected,’ “mod- 
erately favorable,” and “strongly opposed.” 
To the extent that this is the case, and that 
social usage determines meaning for an indi- 
vidual, the observed skewness would be ex- 
pected. 

The other three items where the distribu- 
tion departs noticeably from normality, pre- 
sented in Fig. 3, are item 10, “Like not so 
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Fic. 2. Graphical plots for three phrases displaying 


skewed distributions. 
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Fic. 3. Graphical plots for three phrases displaying 


bimodal distributions. 


well,” item 28, “Like not so much,” and item 
31, “Average.” Items 10 and 28 exhibit evi- 
dence of a bimodal distribution, with one 
mode on the dislike side of neutral, the other 
mode on the like side. Such results are not 
surprising and simply attest to the ambiguity 
Item 31, 


of wording of these two phrases. 
“Average,” also exhibits a bimodal distribu- 
tion in terms of the scale continuum, but a 
bimodal distribution of a different sort than 


for the other two items of Fig. 3. Many of 
the respondents interpreted “Average” to im- 
ply neutral, and placed the meaning of the 
word near the center of the scale. Another 
appreciable group of subjects interpreted “Av- 
erage” as more or less synonymous with “Like 
moderately,” and placed the word two steps 
above the center interval. Few subjects 
placed the word between the center interval 
and two steps above the center interval. 

In addition to the linearity of plots through 
cumulative proportions of responses on arith- 
metic probability paper, a check on the as- 
sumption of normality of distributions re- 
sides in the extent to which, from the scale 
value and dispersion value for each item, the 
empirically determined proportions of re- 
sponses in the raw questionnaire intervals 
can be reproduced. Assuming a normal dis- 
tribution, one may utilize tables of that dis- 
tribution to derive theoretical proportions for 


each of the first eight intervals on the basis 
of the scale value and standard deviation as- 
sociated with an item. (Having estimated 
proportions of responses for eight intervals, 
the proportion of responses in the ninth in- 
terval is completely determined.) The aver- 
age difference between the theoretical cumu- 
lative proportions and the actual empirical 
cumulative proportions appears for each item 
in Table 2 as the discrepancy of reproduc- 
tion. The average discrepancy, over all 51 
items, is .015, which compares favorably with 
discrepancies reported for other studies of 
this sort. Edwards reports an average dis- 
crepancy of .025 for reproducing ratings of 
food items in a sample of college students, 
and an average discrepancy of .021 in re- 
producing nationality preference ratings (3). 
Using the method of paired comparisons, 
other writers report average discrepancies 
ranging from .024 to .031. 

The present application of psychophysics 
to semantics is not dissimilar from that re- 
ported by Mosier (5), who presented to ap- 
proximately 150 college students a successive- 
interval questionnaire containing 296 adjec- 
tives as stimuli. Respondents were asked to 
judge each stimulus on an 11-point “favor- 
ableness-unfavorableness” scale. The method 
of analysis utilized by Mosier (4) was one 
which assumed a Gaussian response distribu- 
tion and for which the unit of measurement 
was chosen to be the standard deviation of 
one phrase, “Completely unsatisfactory.” The 
method of successive intervals applied in the 
present study also assumes normality, but 
takes as a unit of measurement the more 
stable harmonic mean standard deviation of 
response over all stimuli. 

Certain similarities and disparities of re- 
sults from the present study, compared with 
that of Mosier, are worthy of attention. The 
present study has served to reconfirm the 
two basic hypotheses postulated by Mosier: 
(a) that the meaning of a word may be con- 
sidered as if it had two parts, one constant, 
representing typical social meaning, the other 
variable, representing individual interpreta- 
tion; and (6) that the frequency of responses 
to a word can be made to project a Gaussian 
distribution along a scale, and that distribu- 
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tions of responses to nearly all words will be 
normal on the same scale. As was the case 
with Mosier’s results, special diagnosis of the 
few words exhibiting nonnormal distribution, 
e.g., bimodal distributions, has yielded inter- 
esting semantic implications. 

Mosier reports several indications of a 
break in the word meaning continuum at the 
middle of the scale. He found for a number 
of words a “precipice effect,” an apparent 
end-effect at the middle category. There was 
a tendency for large dispersions to be associ- 
ated not only with words at the two extremes 
of meaning, but also with words near the 
point of neutrality on the favorableness-un- 
favorableness continuum. Such results led 
to the speculation that two distinct scales 
were involved “. . . that favorableness-neu- 
tral is one continuum and neutral-unfavor- 
able is another, not collinear with the first” 
(5, p. 133). No analogous effects appear in 
the present investigation. It seems likely 
that the crucial difference resides in different 
instructions to respondents. Mosier’s instruc- 
tion “Try to keep the steps between 1 and 6, 
6 and 11 equal so far as differences of favor- 
ableness are concerned” may have prompted 
subjects to view the response set in terms of 
two 6-interval scales rather than one 11-in- 
terval scale. That particular instructions may 
have determined the observed effects was con- 
sidered a possibility by Mosier (5, p. 133). 

In the present study the words and phrases 
all were presented under comparable condi- 
tions, as items on a successive-interval ques- 
tionnaire. The scale parameters as derived 
are dependent upon the comparability of 
mode of presentation of the phrases. It is 
not expected that scale parameters will be 
invariant under changing contextual condi- 
tions of presentation. Thus, if particular 
phrases are chosen as descriptive adjectives 
for categories on a successive-interval scale, 
the scale parameters may be affected by the 
number of intervals on the scale, the prefer- 
ence distribution of items being judged, the 
order of and distances between phrases se- 
lected, etc. The usefulness of the present re- 
sults does not seem to be severely restricted 


by such considerations, for these results re- 
late directly to the meaning of phrases under 
uniform conditions of presentation, without 
differential context effects. 


Summary 


Fifty-one descriptive adjectives were pre- 
sented as items on a successive-interval sched- 
ule to approximately 900 Army enlisted 
personnel, who were asked to indicate the 
meaning of each word or phrase. For each 
stimulus a scale value and a standard devia- 
tion of meaning were determined. From these 
two parameters, an attempt was made to re- 
produce the cumulative proportions of re- 
sponse for each of the nine categories on the 
questionnaire. The suitability of the suc- 
cessive-interval method of scaling is demon- 
strated by the small average error of repro- 
duction of proportions, about .015 in this 
case. A further check upon the method was 
derived by analysis of distributions of re 
sponses over the scale continuum. For all 
but 6 of the 51 stimuli, these distributions 
did not depart appreciably from normality. 

Results are compared with those reported 
in a similar study by Mosier (5). 

From results of the present study, it is pos- 
sible to select suitable descriptive adjectives 
for use as labels of successive intervals on 
subsequent preference schedules. 
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Rater Reliability and the Heterogeneity of the Scale Anchors 


A. W. Bendig 
University of Pittsburgh 


The problem of the optimum method of 
verbally anchoring the categories on a rating 
scale has received little empirical investiga- 
tion. One recent study (1) had college sub- 
jects (Ss) rate for familiarity 12 foreign coun- 
tries on scales varying in the amount of verbal 
anchoring. Scales with 3, 5, 7, 9, or 11 cate- 
gories were anchored (a) only at the center 
category, (b) at both ends, but not in the 
center, or (c) at both ends and also in the 
center. A small increase in individual rater 
reliability was noted as the number of scale 
anchors increased, with the larger increase 
found between conditions } and c. 

Another related problem in verbally anchor- 
ing scales is the heterogeneity of the anchors 
used to describe the ends of the scale. For 
example, an S might be requested to rate a 
series of foods for preference value on a 
scale whose end anchoring statements were 
“like slightly” and “dislike slightly.” This 
would mean that no matter how much S liked 
or ‘disliked a particular food the above two 
statements are the most extreme preference 
statements that he can report. On the other 
hand, the end statements might be “excel- 
lent” and “despise,” which permit much more 
extreme preference statements. We suggest 
that this second scale, whose end anchoring 
statements are much farther from a neutral 
preference response, has greater psychological 
“width” than does the first scale. The second 
scale, with more heterogeneous end anchors, 
attempts to cover more shades of difference 
between stimuli of varying preference value. 
Rationally it might be assumed that greater 
scale width would encourage raters to report 
individual differences between the stimuli 
more consistently, thus increasing rater re- 
liability, and that this increased reliability 
would be constant regardless of the hetero- 
geneity of the list of foods being rated. 

In the selection of verbal anchors for food 
preference schedules, the recent work of Jones 
and Thurstone (7) has provided scale values 
for a number of such anchors. Using Ed- 


wards’ method of successive intervals (5), 
these authors had 5} verbal anchors scaled 
as to intensity of the preference shown by 
each anchor and have provided median scale 
values and measures of the ambiguity of each 
anchor. We propose to use these scale values 
as measures of the dissimilarity or hetero- 
geneity of the scale anchors in investigating 
the effect of variations in anchors upon rater 
reliability. 
Procedure 


Stimuli. Two lists of 10 foods each were derived 
from a list of 20 foods. The Ss in a previous study 
(2) had rated for preference value all 20 foods and 
the mean ratings were used to rank the foods. List 
I in the present study consisted of the 10 foods 
ranked 6 through 15, while List II contained the 
foods ranked 1 through 5 and 16 through 20. List 
I was composed of a relatively homogeneous group 
of food stimuli and was identical with List 3 used 
in a previous study (3), while List II contained a 
heterogeneous series of foods (those most and least 
preferred out of the original list of 20) and is the 
same as the previously used List I (3). 

Scales. Four 5-category rating scales were con- 
structed with verbal anchors used at the center and 
end categories: the second and fourth categories 
were left unanchored. In Table 1 are given nine 
anchoring statements and their mean scale values 
and variabilities as reported by Jones and Thurstone 
(7). These nine anchors were selected from their 
list under the criteria that they represented approxi- 
mately equal units in scale values along the prefer- 
ence continuum and also had the smallest standard 


Table 1 
Verbal Scale Anchors Used with Rating Scales 
(Data taken from Jones and Thurstone [7 ].) 


Anchor 
Wording 


Scale 
Value SD 


A Like quite a bit 2.32 52 010 
B Like fairly well 1.51 59 018 
Cc Like moderately 1.12 j 009 
D Like slightly 69 32 014 
E Neutral 02 18 015 
F Dislike slightly — .59 .27 .020 
G Dislike moderately —1.20 Al .026 
H Poor —1.55 87 014 
I Bad —2.02 .80 .010 


Discrep- 


Anchor ancy 
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Table 2 


Rater Reliability and Bias Measures for Scales Varying in the Heterogeneity of the Scale Anchors 








Number 


Anchor —_ Stimulus of 


Scale Anchors Width List Raters 


Rater Reliability Rater Bias 





Confidence 
Limits (90%) 


Confidence 
Limits (90%) 
Lower 


Upper Upper Lower 





1 DEF 1.3 I 25 
II 25 

CEG 2.3 I 25 

II 23 

BEH 3.1 T 24 

II 26 

AEI 4.3 I 26 

II 26 


18 03 , a .08 

31 09 , . 17 

.16 02 — 32 
21 36 11 — .18 

.20 04 a - 19 
32 50 ‘ 48 
15 Ys : : é 18 
.29 45 





* Coefficient of .06 significant at the .01 level. 


** Coefficient of .36 significant at the .05 level and .47 at the .01 level. 


deviation at each scale point. The word “neutral” 
was used to anchor the center category for all four 
scales. Anchors D and F were used as end anchors 
for Scale 1; C and G for Scale 2; B and H for 
Scale 3; and A and I for Scale 4. A crude measure 
of the psychological width of each scale is the alge- 
braic difference between scale values for the end 
anchors. For example, the difference between the 
scale value of anchor A (2.32) and anchor I (— 2.02) 
is 4.34, which is taken as an index of the psycho- 
logical width of the anchoring statements used with 
Scale 4. 

Subjects. The eight combinations of two stimulus 
lists and four rating scales were mimeographed on 
single sheets with rater instructions similar to those 
used in previous studies (2, 3). These sheets were 
randomly distributed to 200 Ss enrolled in six day- 
time sections of introductory psychology at the Uni- 
versity of Pittsburgh. The number of Ss in each of 
the eight groups varied only slightly because of 
chance fluctuations. 


Results 


Estimates of the individual rater reliability 
of each scale were computed by analysis of 
variance techniques and can be found in 
Table 2. Measures of rater bias and the 
90% fiducial limits for both reliability and 
bias measures are also given in Table 2 (8, 
pp. 361-362). Rater bias, as previously 
used (3, 4), is a measure of the extent of in- 
dividual ‘differences between raters in the 
mean preference rating each rater assigns to 
all 10 stimuli. 

To assess the significance of the effect of 
stimulus list and scale anchoring upon rater 
reliability and bias, the measures reported in 
Table 2 were normalized and subjected to an 
analysis of variance. Rater bias measures 


were normalized by the usual r-to-z transfor- 
mation, while the reliability measures were 
normalized by the special formula given by 
Fisher (6, p. 219) for intraclass reliability 
coefficients. The normalized measures were 
analyzed as a two-variable analysis of vari- 
ance design with the major variables being 
the two stimulus lists and the four scale an- 
choring conditions. In addition, the between- 
scales term was split into two components: 
(a) the variance attributable to a linear re- 
lationship between scale anchor width and 
measures of reliability or bias, and (d) the 
residual variation around this linear regres- 
sion line. 

The results of these two analyses of vari- 
ance can be found in Table 3. For measures 
of rater reliability, the difference between the 
two stimulus lists is significant at the .01 


Table 3 


Analyses of Variance of Transformed (r-to-z) Rater 
Reliability and Bias Coefficients 








Rater Reliability 


Rater Bias 
Mean 
Square F 


Mean 
Square F 


Source of 





2023 +381" 
0404 3.9 
0988 9.6" 


0112 1.1 
.0103 


0770 2.1 
0388 = 1.1 
0236 0.9 
0464 1.3 
0365 


Scales 
Linearity 
Residual 

Error 





* Significant at the .10 level. 
** Significant at the .01 level. 
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level. The linear relation between anchor 
width and rater reliability falls just short of 
statistical significance at the .05 level of con- 
fidence (with a two-tailed test an F of 10.1 
is significant at .05). However, the magni- 
tude of the mean square for this linear rela- 
tion and the product-moment correlation 
between anchor width and rater reliability 
measures (.90) suggest that this may be a 
real relation. As for rater bias, none of the 
F ratios in Table 3 suggests that there is any 
relation between either the stimulus lists or 
scale anchoring and measures of rater bias. 


Discussion 


The results of the present study indicate 
that the psychological width of the verbal 
anchors used to define the end categories of 
a rating scale influences rater reliability only 
slightly, and rater bias not at all. The data 
reported in Tables 2 and 3 suggest that a 
rectilinear relation exists between anchor 
width and rater reliability, but that tripling 
the width of the anchor raises the mean in- 
dividual rater reliability from .14 to .22. No 
such increase is evident in the measures of 
rater bias. 

The significant difference between the re- 
liabilities found for Lists I and II confirms 
the results of a previous study (3) where 
rater reliability varied directly as a func- 
tion of the heterogeneity of the stimuli rated. 
As also noted previously (3), rater bias ap- 
pears unaffected by stimulus heterogeneity. 
It should be noted that the mean reliability 
with List I found in this study (.10) is al- 
most identical to the reliability of the same 
stimulus list previously used with a 5-cate- 
gory scale (.09) and that highly similar re- 
liabilities were also found for List II (.25 
and .26) in these two studies (3). Both the 
absolute values of the reliabilities of Lists I 
and II and the differences between the two 
stimulus lists are quite stable in successive 
samples of raters. 

These results and those of earlier studies 
(1, 2, 3, 4) suggest that the reliability of a 
rating scale is influenced more by character- 
istics of the rater than by characteristics of 
the construction of the scale. For example, 
tripling the width of the scale anchors in- 
creased rater reliability by an average of .08, 
while increasing the number of verbal anchors 
defining scale categories from one to three 


raised reliability only .07 (1, p. 40). In- 
creasing the number of scale categories from 
three to nine shows little evidence of affect- 
ing scale reliability (1, 2,3). However, rela- 
tively small differences in the heterogeneity 
of the rated stimuli increased reliability by 
approximately .16 (3), while educational dif- 
ferences between raters raises the reliability 
by as much as .18 (4). The latter study (4) 
also indicated a possible interaction between 
educational level of the raters and scale 
length, with less experienced raters profiting 
more from the use of longer scales. 


Summary 


Raters (NV = 200) used 5-category rating 
scales, which varied in the heterogeneity of 
the verbal anchors defining the end cate- 
gories, to rate for preference value two lists 
of 10 foods. The food lists differed in the 
homogeneity of food stimuli on each list. 
Measures of individual rater reliability and 
rater bias were computed and analyzed as to 
the effect of scale and list differences. Re- 
liability was significantly smaller for the more 
homogeneous list and increased linearly as a 
function of the heterogeneity of the end an- 
chors. Rater bias was unaffected by either 
scale or list differences. 


Received February 8, 1954. 
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The Relationship of Patterning (Underlining) to Immediate 
Retention and to Acceptability of Technical Material * 


George R. Klare, James E. Mabry, and Levarl M. Gustafson 


University of Illinois 2 


This study is one of a series* on the rela- 
tionship of various communication variables 
to the learning of technical training material. 
“Patterning,” as used here, refers to the stress 
or emphasis of various words in the context 
of printed material. Patterning can be ac- 
complished in a number of ways, such as 
capital letters, type face, type size, italic or 
boldface type, underlining, etc. Here, how- 
ever, the emphasis is not on methods of em- 
phasis but rather on the effect of emphasis on 
measures related to learning of the material. 

Emphasis as a factor in verbal learning, in 
terms of “vividness,” may be traced back to 
the British Associationists, and may be car- 
ried through “rhythmical grouping,’ con- 
trasting print (3), etc. to recent studies of 
“isolation” (7). The studies made, however, 
have not been on meaningful prose. Appar- 
ently the only application of stress techniques 
to prose is that of Dearborn, Johnston, and 
Carmichael (2). These investigators at- 
tempted to emphasize the one word in a 
typed sentence which carries the “peak stress” 
by using all capitals, overstriking to simulate 
boldface, and underlining. Increased com- 
prehension was found in several studies with 
college students. 

This study was an attempt to relate pat- 
terning of technical material to immediate re- 
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tention test score, amount read in a given 
time, and acceptability of the material. 


Method 


Experimental materials. A 1,206-word multilith- 
printed lesson from an aircraft mechanics training 
course at Chanute Air Force Base was used in this 
study. It consisted of a first half on the “induc- 
tion system” and a second half on the “cooling sys- 
tem” of an aircraft engine. 

The Present (the standard, unpatterned) lesson 
was used along with two experimental lessons in 
which underlining was used to achieve stress. The 
first experimental pattern (called PPA) was the un- 
derlining of those words in the lesson which were to 
appear in the correct answers to the 50-item multi- 
ple-choice test (see below) used to measure subjects’ 
retention. Since the correct answers to test items 
were not liited directly from the reading material, 
this amounted actually to underlining as many as 
possible of the words that were to appear. 

The second experimental pattern (called PPB) 
was the underlining of “important” words. Impor- 
tant words, as here defined, are those which carry 
essential, specialized information, without regard to 
whether or not they occur in correct test item 
choices. Examples of such words are technical terms 
(e.g., “master control”), position terms (eg., “... 
located just below the master control and on the 
forward side of ...”), and differentiation terms 
(e.g., “. . . the odd numbers being left pumps while 
the even numbers are right . . .”). 

The underlining was done by the three authors, 
and where questions of underlining or not under- 
lining a word arose, they were resolved jointly. 
PPA had 199 words underlined, PPB had 129. Sub- 
jects (Ss) were given no information regarding the 
purpose of the underlining before reading. 

Nine “versions” were used in the study. Three 
were called “unsplit,” treating both halves of the 
lesson in the same way; these were P for Present), 
PPA (prosodic patterning A), and PPB (prosodic pat- 
terning B). Six versions were called “split,” treat- 
ing the two halves of the lesson in a different style; 
these were PPPA (first half Present, second half 
PPA), PPAP, PPPB, PPBP, PPAB, and PPBA. All 
versions had the same format except for the under- 
lining. 

Every fifth line of each version was numbered in 
order to help in getting a measure of amount read 
by Ss. Each S made a tally mark for each complete 
reading during the experimental period, and indicated 
the line he was reading when asked to stop. 

A 50-item multiple-choice test was used to meas- 
ure immediate retention. The items were selected 
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from a pool of 112 items in such a way that each 
paragraph of the reading material was allotted a 
percentage of items proportional to its size. The 
split-half reliability of the test was .87. 

Acceptability of the patterning presentations was 
determined by the answers to three questions. The 
first asked if Ss noticed a difference in the way the 
two halves of the lesson were presented (they were 
asked mot to answer in terms of content). Those 
who noticed a difference were then asked which half 
of the material they thought “easier to read” (ques- 
tion 2) and which half “more pleasant to read” 
(question 3). All Ss, those who had the unsplit as 
well as those who had the split versions, answered 
these questions. 

Subjects and procedure. The Ss used were 989 
male airmen in indoctrination training at Sampson 
Air Force Base (109 Ss for PPBA, 110 for all other 
versions). Various aptitude indices were collected 
for all airmen, based on the Airman Classification 
Test Battery (these indices, based on a scale of 1 to 
9, are conveniently called “stanines’’). 

The Ss were given 20 minutes to read the lesson, 
and during that period indicated amount read in the 
way previously described. The Ss then answered 
the three attitude questions designed to determine 
acceptability of the presentations. Following this, 
40 minutes were allowed for answering the test. 


Results 


Amount read (20 minutes). The Present 
version resulted in fewer lines read than the 
PPA version, and the PPA version fewer than 
the PPB. The differences were small and 
were not significant. The same was true for 
number of words read, with the PPB version 
superior but not significantly so. It appears 
that patterning as used here neither helped 
nor hindered Ss in their reading. 

Acceptability. Acceptability of the pat- 
terning presentations, it will be recalled, was 
determined by the answers to three questions 
(“did the halves differ,” “which half easier to 
read,” “which half more pleasant to read’’). 
It was found that despite instructions to an- 
swer on the basis of presentation of the mate- 
rial and not content, Ss’ answers were based 


Table 1 


Test Score Data in Relation to the Three Patterning 
Versions (V = 330) 





Patterning 
Version N 


Mean* 
24.69 
25.35 
26.45 


Present 110 
PPA 110 
PPB 110 


8.23 
10.39 
8.82 





* Maximum possible score, 50; chance score, 12.5. 


on content as well. This is shown by the fact 
that the “cooling system” material was pre- 
ferred to the “induction system” material 
whatever the manner of presentation. 

Since split versions were used in the experi- 
ment, however, it was possible to equate for 
content with respect to the patterning treat- 
ment used. Two 6 X 2 contingency tables 
were set up (split versions < lesson halves), 
one for “which half easier’ and one for 
“which half more pleasant.” They yielded 
values of 3.55 (.70 >> .50) and 3.27 
(.70 > p > .50), respectively, showing no 
significant differences in manner of presenta- 
tion (Present, PPA, and PPB). Examination 
of the individual values showed, if anything, 
a very slight tendency to favor the patterned 
versions over the Present version. Tetra- 
choric correlation coefficients computed be- 
tween judgments of “which half easier” and 
“which half more pleasant” ranged from .80 
to .97. 

Immediate retention test score. Analysis 
of scores on the 50-item test showed that the 
PPB version resulted in higher immediate 
retention test scores than the PPA, which in 
turn resulted in higher scores than the Pres- 
ent version. Table 1 presents these data. 

It had previously been found, however, 
that Ss’ mechanical aptitude indices (stanine 
scores) correlated highly with test scores. 
Correlations based on these data were in 
agreement, the values being .75, .83, and .78 
for the Present, PPA, and PPB groups, re- 
spectively. Since there was a fairly large 
difference in mean mechanical aptitude scores 
for the three groups (5.13, 5.10, and 5.57), 
an aptitude X patterning analysis of variance 
was computed.* Table 2 presents the results 
of this analysis. 

As Table 2 shows, the variance attributable 
to patterning was not significant, but that 
attributable to mechanical aptitude and that 
attributable to interaction were significant. 
Examination of the relevant means shows that 
the significant interaction F value resulted 

4The method of analysis used was that suggested 
by Snedecor (8) for the case of disproportionate 
frequencies in subclasses. The additional precaution 
suggested by Johnson (5), the use of chi square to 
determine whether actual cell frequencies differ sig- 
nificantly from those expected, was also used. In 
the analysis reported here, stanine groups 1 and 2 


were combined to get a sufficient V, making an 8 X 3 
analysis. 
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Table 2 


Analysis of Variance of Test Scores in Relation to 
Patterning and Mechanical Aptitude 
(N = 330) 








Sum of 
Squares df 


33.61 2 


Mean 
Square 


16.80 


Source 





Patterning 


Mechanical 


aptitude 18,170.88 7 


Interaction 807.78 14 
Within 9,507.71 306 





from the fact that Ss for the patterning ver- 
sions (both PPA and PPB) who had low 
aptitude indices got poorer test scores than 
similar Ss for the Present version, and those 
who had high aptitude indices got better test 
scores than similar Ss for the Present version. 


Discussion 


It will be recalled that Dearborn, Johns- 
ton, and Carmichael (2) used a kind of pat- 
terning called “peak stress” in studying the 
learning of meaningful contextual material by 
college students. In the present study, the 
significant interaction effect found between 
mechanical aptitude indices and test scores 
may well be related to this. That is, the 
more able Ss (the high-aptitude airmen would 
probably be fairly comparable to college stu- 
dents) may well profit from a presentation of 
this kind, while the less able may not be able 
to. It should be remembered that no notion 
of the rationale for the underlining was given 
Ss; with some indication of this, it is possible 
that the less able Ss might not be so handi- 
capped. 

This study brings out several other points 
of value for future research. First, appar- 
ently rather extreme alterations (at least in 
terms of underlining) can be made in the 
printed page without disturbing. Ss. The 
PPA version used in the study had 199 words 
underlined and the PPB version had 129, yet 
neither version was judged “less pleasant” to 
read than the Present version (no underlin- 
ing). Second, the underlining as used here 
affected amount read (and, by inference, 
speed of reading) very little. 

Finally, it should be mentioned that de- 
layed retention was also studied though not 


reported in this paper. The Ss were retested 
(using the same 50-item test) two weeks after 
original testing. Several analyses of variance 
were computed and showed significant vari- 
ance attributable to patterning, but they have 
not been reported due to heterogeneity of the 
variances. Should further planned work also 
show that patterning aids retention, the re- 
sults would appear somewhat similar to such 
early results as those of Calkins (1), Jersild 
(4), and Van Buskirk (9) in finding that 
“vivid” items in learned lists are favored in 
recall. 


Conclusions 


The results of this study indicate that pat- 
terning (underlining of selected words) of 
reading material, even though Ss are given no 
indication of its rationale, may still result in 
somewhat greater immediate retention than 
ordinary material for more able Ss. Less able 
Ss, on the other hand, may be hindered by 
such a presentation if they are not told what 
it means. Further, patterning as used here 
appears to have little effect on either the 
speed with which material is read or on its 
acceptability. 


Received February 23, 1954. 
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Height-Width Proportion and Stroke Width in Numeral 
Visibility * 
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Previous researches (4, 5, 16) have shown 
that the visibility of numbers is influenced 
by their height and width; the typical finding 
has been that as either height or width was 
increased, visibility increased, within the 
limits of the study. The hypothesis has been 
suggested (16) that these increases in visi- 
bility may be attributed to the resulting in- 
creases in area, rather than to the alteration 
of proportions. It seems reasonable, however, 
that the relative proportions of height and 
width of the number should themselves influ- 
ence visibility, and that a study of this vari- 
able, independent of the area of the number, 
should contribute to the knowledge of how 
to design a more visible set of numbers. 
Stroke width has been studied in a number 
of researches (2, 3, 6, 8; 11, 12, 13, 14, 15). 
These data show sufficient agreement that 
there seems little point in further study of 
stroke width per se, but at the same time it is 
likely that the width of the number might in- 
fluence the optimal stroke width, that is, that 
width and stroke width might interact. Con- 
sequently, it has been considered desirable to 
study these two variables simultaneously. 


Procedure 


The apparatus used was the Harvard tachistoscope, 
manufactured by Ralph Gerbrands of Arlington, Mas- 
sachusetts. The Ss were volunteers from the elemen- 
tary psychology laboratory at the University of Min- 
nesota. All Ss demonstrated 95% or better near 
visual acuity on the Keystone Telebinocular (10). 
Two sets of stimulus material were used—a set of 
numerals in 10-point Century type, and a set of 
hand-drawn numerals in the form recommended by 
Brown, Lowery, and Willis (6) and further studied 
by Atkinson, Crumley, and Willis (1). The latter 
were hand drawn in large sizes and reduced photo- 
graphically. Twelve sets of the numbers were pre- 


1 This paper is taken from part of the writer’s 
PhD thesis submitted to the University of Minnesota 
in 1952. The author is indebted for guidance and 
encouragement to Professors Donald G. Paterson and 
Miles A. Tinker under whose direction the study was 
conducted. 
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pared, all combinations of four height-width combi- 
nations and three stroke widths. The height-width 
combinations were set so as to maintain a constant 
rectangular area covered by the number, with height 
to width ratios of 10:3, 10:4.5, 10:6, and 10:7.5. 
The heights and widths of the finished numbers 
were, in inches, respectively: .1313 X .0396, .1067 x 
0483, .0934 X .0558, and .0834 X .0625; the stroke 
widths were .0167, .0125, and .0083. The stroke 
width to height ratios ranged from 1:8 to 1:5 for 
the widest stroke width and 1:16 to 1:10 for the 
narrowest stroke width, with the various heights of 
numbers. 

Since tachistoscopic presentation of single symbols 
frequently results in a piling up of either no cor- 
rect or all correct scores, with badly skewed dis- 
tributions, a pilot study was conducted to find the 
combination of exposure time and illumination level 
that would result in about half the responses being 
correct. This was found to be 40 msec. at 1 ft.- 
candle for the stimulus material used. 

In the main experiment, each S observed 30 ex- 
posures of 10-point Century type as practice mate- 
rial, followed by 90 exposures of one set of the ex- 
perimental materials. A total of 72 Ss was used for 
six replications of 12 experimental conditions. The 
score for each S was the number of digits correctly 
identified. 

An analysis of variance was carried out separately 
for the results for each number. The data from the 
practice period were used by covariance analysis to 
adjust the means and sums of squares for differ- 
ences in ability between groups of Ss. The as- 
sumptions of homogeneity of variance and normality 
of distribution were examined by probit analysis as 
suggested by Johnson (9). The assumption of line- 
arity of regression was examined by plotting prac- 
tice performance against experimental performance. 

The assumptions of normality of distribution and 
homogeneity of variance appeared to be satisfied for 
all but two sets of the data—those for the 0 and 
8. In these cases an excess of zero scores for the 
two taller, narrower height-width combinations ren- 
dered the assumptions questionable. Since no trans- 
formation would remedy this difficulty, and since 
the data for the two shorter, wider height-width 
combinations were normal in both cases, separate 
analyses were carried out for the complete data and 
for the portion of the data meeting the assumptions 
in these two cases. All of the data satisfied the as- 
sumption of linearity of regression between practice 
and experimental data. 

The adjusted sums of squares were computed by 
an approximate procedure outlined by Cochran and 
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Cox (7, p. 81). In any case in which the F ratio 
approached or barely reached significance the result 
was verified by the exact test suggested by the same 
authors (7, pp. 80-81). As an aid in judging the 
practical importance of the differences between means, 
a standard deviation was computed from the error 
mean square of each analysis of variance table. 
Row and column means were also adjusted for the 
regression of experimental performance on practice 
performance by a method set forth by Cochran and 
Cox (7, p. 78). 


Results and Discussion 


The results of the analyses of variance and 
covariance are shown in Table 1. It can be 
seen that for all numerals except the 8 the 
differences in visibility due to different stroke 
widths were not significant. For the 8, how- 
ever, the differences were significant beyond 
the 1% level, both for the complete data 
and for the portion of the data that met the 
assumptions for the test. Since the stroke- 


width values used were those found optimal 
in other studies, it is not surprising that most 
of the stroke-width differences were not sig- 
nificant. 

The differences in visibility attributable to 
different height-width combinations were sig- 
nificant beyond the 1% level for the 0, 3, 4, 


7, and 9, and beyond the 5% level for the 


Table 1 


F Ratios from Analyses of Variance for Numerals in All 
Combinations of Four Heights and Widths 
and Three Stroke Widths 








Between 
Height- 
Width 
Combinations 
6.60** 

4.66* 
1.33 
1.74 
5.04** 
4.63** 
1.75 
3.79* 
6.74** 


Between 
Stroke 
Widths 

0 3.09 

1.53 

1.92 


Numeral 
Analyzed 


Inter- 
action 
2.16 
1.85 





i) 
—+ 


1.23 


3.06 


14.45** 
6.27** 


COMmMIA NE WHE 


9.38** 





* Significant at the 5% level. 
** Significant at the 1% level. 
+ Partial data. 
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partial data of the O and for the 6. Height- 
width proportion, then, is an effective vari- 
able in the visibility of numbers. 

The findings with respect to interactions 
are surprising, and perhaps the most impor- 
tant outcome of the study. The interaction 
of height-width proportion and stroke width 
was in no case significant. This is important 
in indicating that further study of height- 
width proportion may be carried out without 
reference to stroke width. 

Which combination of height-width pro- 
portion and stroke width is most visible is a 
more difficult question to answer. The mean 
number of correct readings per S of each 
numeral for each height-width proportion and 
stroke width, adjusted for Ss’ practice per- 
formance, is shown in Table 2. 

Several considerations favor using the 
height-width ratio of 10:7.5. This would 
minimize confusions resulting from the least 
visible number, the 0. The trend of visibility 
for the greatest number of digits would re- 
quire that the height-width ratio of either 
10:6 or 10:7.5 be used. As the proportions 
changed from 10:6 to 10:7.5, declines in visi- 
bility resulted for the 1, 2, 5, and 8; but these 
were numbers for which height-width propor- 
tion was not a significant variable. The 4 
and 7 also show slight declines, but these are 
almost certainly not the changes that resulted 
in significance, and are too small to be of 
practical importance in any case. The only 
numbers for which one of the two widest com- 
binations was not the most visible are the 5 
and 9. In the case of the 9 it is likely that 
the differences which were significant were 
those between the narrowest combination and 
the other three, and that the differences be- 
tween the other three are chance differences. 
None of the differences was significant for the 
5. So it would appear that the height-width 
ratio of 10:7.5 is the most visible combina- 
tion of those studied. It is to be noted, how- 
ever, that several of the numbers showed con- 
sistent trends of increasing visibility as width 
increased, to the limit of the study, and the 
possibility remains that a still wider numeral 
form would be even more visible. 

The most visible stroke width appeared to 
be the narrowest studied. For the 0 and 8 
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Table 2 


Average Number Correct Responses Per Subject for Numbers in Different Heights, 
Widths, and Stroke Widths 








Combinations of Height and Width 
(Inches)* 





.0934 
0558 
10:6 


.0834 
.0625 
10:7.5 


0483 


Numeral 10:4.5 


Stroke Width 
(Inches) 


0167 





0 : 1.19 2.32 
2.45 
7.20 
3.90 5.50 
2.92 4.87 
6.04 7.71 
4.25 3.42 
3.45 5.69 
5.53 7.88 
8 2.69 3.74 
8t - 3.88 


3.74 
4.06 


° 
= 


6.28 


nn ke wh 


-~ 


9 3.67 k 6.51 


1.45 
2.36 
7.07 
5.08 
5 5.03 
7.13 7.65 
3.45 3.33 4.08 
5.67 4.15 . 
7.18 7.44 1.83 
5.15 2.06 2.12 
5.64 Ss: 2.41 2.35 
6.30 2.22 


5.65 





* The first entry in each of these four subheadings is the height; the second the width, and the third the height-width ratio. 


+ Partial data. 


t Square root of the error mean square from the analysis of variance, which represents the average variation of all cells of 


the analysis table. 


the narrowest stroke width was the most 
visible, and for the 6 the narrowest and in- 
termediate stroke widths were about equal 
and superior to the widest stroke width. 
These were the only numerals in which stroke- 
width differences were significant or ap- 
proached significance. Although a greater 
number of digits was most visible in the 
widest stroke width studied, the differences in 
visibility were small and in no case significant. 
Fewer of the numbers were most visible in 
the narrowest stroke width, but all of the sig- 
nificant differences, and the largest ones, are 
in this direction. 

The set of numbers drawn in the optimal 
combination of height-width proportion and 
stroke width is shown in Fig. 1. 


| e 3 4 


6 7 8 9 0 


‘1G. 1. The most visible numerals. 


Summary 


The aim of the study was to test the effect 
of different combinations of height and width, 


with area held constant, on the visibility of 
numerals. Since stroke width might be ex- 
pected to interact with height and width, it 
was studied simultaneously. A total of 72 
Ss provided six replications for all combina- 
tions of height-width proportion at four lev- 
els, and stroke width at three levels. The 
data were treated by analysis of variance and 
covariance. 

1. Height-width proportion was a source of 
variation in visibility significant beyond the 
1% level for five numerals, and beyond the 
5% level for a sixth. 

2. Stroke width was a significant source of 
variation for only one numeral. 

3. The most visible combination of height, 
width, and stroke width for all numbers is a 
height-width ratio of 10:7.5 and a stroke 
width to height ratio of 1:10. 

4. Height-width proportion 
width show no interaction. 


and stroke 
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Flesch Readability Formula Applied to Television 
Programs 


Rudolph H. Vancura 


University of Omaha 


This study is concerned with measuring 
the audible vocabulary of television programs 
with the Flesch yardstick (1). A Flesch 
analysis of audible television vocabulary, 
prior to this work, has not been formally at- 
tempted, and therefore this work breaks new 
ground in Flesch readability research. Flesch 
application makes possible an objective meas- 
urement of audible television vocabulary 
which may be compared and evaluated with 
other criteria. 


Procedure 


Two categories of television programs were 
chosen: (a) daytime (6 A.M. through 6 P.m.), Jocal, 
Monday through Friday, adult interest (personally 
determined), one main person talking programs; 
and (6) evening (6 p.m. through 12 p.m.), network, 
once-a-week programs.1 The former category con- 
sisted of eight programs, whereas a random sample 
of 25 of the latter was taken because of the large 
number of programs (81) in that category. 

The television programs in this study were pro- 
cured from the program schedules of WOW-TV for 
the week of February 1 through February 7, 1953, 
and KMTV for the week of March 8 through March 
14, 1953. These two stations served as the only 
television stations in Omaha, Nebraska, at the time 
of this writing. Since all of the programs which 
were studied were listed on these two program 
schedules, it is therefore understood that the “local” 
programs originated at either of and only from 
these stations and that the “network” programs were 
those which were chosen to be telecasted by these 
stations. 

After the programs were selected, the following 
procedure was put into effect: (a) On a chance oc- 
casion, the selected programs were tape recorded in 
their entirety once; (b) five 100-word samples were 
then taken at random from each program and tran- 
scribed into written sentences; and (c) the Flesch 
formula for readability (2) was applied to the writ- 
ten random samples from the selected programs. 

The data which were obtained are shown in 
Tables 1 and 2. Table 1 indicates the data which 
were obtained for the Selected Daytime programs; 
Table 2 indicates the data which were obtained for 
the Selected Evening programs. The data in both 


1For convenience these programs henceforth will 
be referred to as “Selected Daytime” programs and 
“Selected Evening” programs respectively. 
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Tables 1 and 2 consist, of course, as in any Flesch 
application, of Mean Reading Ease scores and Mean 
Human Interest scores which were tabulated and 
rounded off to the nearest whole number for the 
programs in this study. 

The data were also processed by means of the 
rank-order correlation technique to see if any mean- 
ingful relationships existed between various com- 
binations of scores, including the relation between 
Flesch scores and ratings of audience attraction 
(Telepulse, 3). The Telepulse ratings provide popu- 
larity ratings which are used commercially and which 
were taken at about the time of recording. 


Table 1 


Reading Ease and Human Interest Scores Obtained 
for the Selected Daytime Programs 


Mean 
Human 
Interest 

Score 


Mean 
Reading 
Ease 


Daytime Program Score 


Your TV Home 
Cup and Saucer 


90 
83 
82 
82 
72 
63 
61 
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TV Farm Reporter 
Martha’s Kitchen 
Woman’s View 
Noon Edition 
Midday News 


TV Classroom 


Mean score 
SD 


73 
2 


Results 


The rank-order correlation between the 
Reading Ease scores and the Human Interest 
scores of Table 1 was + .89, significant at 
the .01 level. 

The rank-order correlation between the 
Reading Ease scores of Table 1 and the 
Telepulse ratings (3) for these programs was 
— .12, nonsignificant at the .05 level. 

The rank-order correlation between the 
Human Interest scores of Table 1 and the 
Telepulse ratings for these programs was 
— .19, nonsignificant at the .05 level. 
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Table 2 


Reading Ease and Human Interest Scores Obtained 
for the Selected Evening Programs 








Mean Mean 
Reading Human 

Ease Interest 

Score Score 
Goodyear TV Playhouse 98 90 
Suspense 96 93 
Beulah 96 106 
Cisco Kid 95 109 
Texaco Star Theater 94 71 
City Hospital 94 83 
Fred Waring Show 93 81 
Mr. & Mrs. North 82 
Winchell & Mahoney 92 80 
Red Skelton 92 77 
Down You Go 92 67 
The Web 92 81 
Toast of the Town 91 80 
Studio One 91 83 
Douglas Fairbanks Jr. Presents : 86 
Jackie Gleason Show 88 
Playhouse of Stars 
Twenty Questions 63 
Tales of Tomorrow 72 
Hollywood Screen Test 76 
Ford Theater 73 
You Asked For It 77 
Life Begins at Eighty 79 
Boxing (Friday) 
Life is Worth Living 


Evening Program 





Mean score 
SD 


The rank-order correlation between the 
Reading Ease scores and the Human Interest 
scores of Table 2 was + .59, significant at 
the .01 level. 

The rank-order correlation between the 
Reading Ease scores of Table 2 and the 
Telepulse ratings for these programs was 
+ .35, nonsignificant at the .05 level. 

The rank-order correlation between the 
Human Interest scores of Table 2 and the 


Telepulse ratings for these programs was 
+ .37, nonsignificant at the .05 level. 


Summary 


The measurement of the eight daytime 
(6 A.M. through 6 P.m.), local (Omaha, Ne- 
braska), Monday through Friday, adult in- 
terest (personally determined), one main per- 
son talking television programs in this study 
utilizing the Flesch formula for readability re- 
vealed that these .programs were communicat- 
ing for the most part an audible vocabulary 
which was relatively simple, and, further- 
more, this audible vocabulary for the most 
part contained a relatively large amount of 
human interest. 

The measurement of the random sample of 
25 evening (6 P.M. through 12 P.m.), net- 
work, once-a-week television programs in this 
study utilizing the Flesch formula for read- 
ability revealed that these programs were 
communicating for the most part an audible 
vocabulary which was relatively very simple, 
and, furthermore, this audible vocabulary for 
the most part contained a relatively very 
large amount of human interest. 

Although Reading Ease scores and Human 
Interest scores for the same programs in this 
study correlated positively and significantly 
with each other, no significant mutual or 
reciprocal relationship existed between Read- 
ing Ease scores and Telepulse ratings for the 
same programs or between Human Interest 
scores and Telepulse ratings for the same 
programs. 
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Improving the Prediction of Academic Achievement by Use 
of the MMPI * 


J. W. Frick 


University of Southern California 


In this study the following hypothesis was 
tested: in a relatively homogeneous college 
population the prediction of academic achieve- 
ment will be enhanced, over that afforded by 
a test of aptitude alone, by the inclusion of 
personality variables in a predictive battery 
and the utilization of all information avail- 
able in the data by means of correlational 
procedures. 


Method 


The experimental sample was composed of 267 
freshman women at the University of California, 
Santa Barbara College, who had completed two se- 
mesters of the freshman year in 1951 and 1952. 
Their mean age was 18.3 years. At matriculation, 
they had taken (as standard procedure) the Ameri- 
can Council on Education Psychological Examina- 
tion (ACE) and the Minnesota Multiphasic Person- 
ality Inventory (MMPI). In order to be included 
in the sample, each S’s validating scales on the 
MMPI had to be within acceptable limits for re- 
search purposes. 

As a check on the similarity of this sample to 
other samples of college women, a mean profile of 
, the eight clinical scales of the MMPI employed was 
plotted for our Ss against two similar profiles, one 
provided by McKinley and Hathaway (5) and one 
by Lough (4). All three profiles are well within 
the limits of +1 SD (in T scores), with only one 
scale (Pt) showing a range of as much as 5 raw- 
score points. It therefore seemed reasonable to sup- 
pose that with regard to this particular personality 
inventory, the Santa Barbara group may be consid- 
ered comparable to a Midwestern and an Eastern 
group of similar composition as regards sex, age, and 
intellectual level (1). 

While it is recognized that academic success may 
be measured by other criteria than grade-point av- 
erage (GPA) (e.g., general adjustment, social ac- 
tivity, etc.), GPA is the most easily measured, and is 
the basic criterion for admission to further educa- 
tion. Moreover, since there tends to be a positive 
correlation between favorable psychological variables, 
it appears likely that good personality adjustment 


1 Based on a master’s thesis presented to the De- 
partment of Psychology, University of Southern 
California. The author is indebted to Dr. J. P. 
Guilford for constructive criticism and helpful guid- 
ance in the conduct of this study, and to Dr. Helen 
M. Sweet, Dean of Women, Santa Barbara College, 
for valuable assistance in the collection of data. 
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(as indicated by the MMPI) would be reflected in 
GPA as well as in over-all academic achievement. 

ACE scores for the group were the percentile 
scores attained as members of a group of approxi- 
mately 580 men and women. The distribution 
ranged from .74 to 99.7, and was both symmetrical 
and obviously platykurtic. Percentile scores were 
used because of their availability, and because, when 
plotted against another symmetrical distribution, 
both the linearity and homoscedasticity required for 
the Pearsonian coefficient are present. Moreover, 
under these conditions, the difference between co- 
efficients computed from raw scores and percentile 
scores may be expected not to exceed 02.2 The 
large SD (Table 1) appears to indicate a variability 
within the group that should contribute to good 
discrimination among its members. 


Table 1 
Means and Standard Deviations of 
Experimental Variables 


Variable 


GPA 





SD 


10.0 
(0.5)* 
27.7 
2.9 
10.0 
(4.3)* 
4.0 
3.3 
2.5 
4.5 
4.3 
3.8 





ACE 
Hs 
D 


Pd 
Pa 
Pt 
Sc 
Ma 


* Distribution normalized by T scaling. 
theses are original means and SD's. 


Figures in paren- 

Raw scores, with K added, were used for all 
MMPI scales with the exception of the D scale. 
The distribution of the latter was positively skewed, 
whereas all others were symmetrical; therefore the 
D scores were normalized by T scaling. The vali- 
dating scales of the MMPI were not included in the 
prediction battery since definite information as to 
their clinical diagnostic value is lacking. The Mf 
scale was not used since the population was homo- 
geneous with respect to sex, and there appears to be 
some doubt as to the clinical significance of this 


2 Guilford, J. P. Personal communication, 1954. 
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Table 2 


Correlation Matrix of Selected Variables for Computation of Regression Weights 








ACE Hs D 
Variable 2 3 4 


Pt Sc Ma GPAt 
8 9 10 1 





ACE — .04 — .08 
Hs — .04 .29 
D — .08 .29 

Hy —.12 64 18 
Pd — .16 21 .27 
Pa — .03 .14 32 
Pi — .07 38 .64 
Se — .02 42 42 
Ma —.20 .00 — .08 
GPA 48 —.15 — .04 


—s 
-moOoCecanaun & WSN 


— .16 — .20 .48** 
21 : 38 42 .00 —.15* 
.27 j .64 42 — .04 
31 ‘ 30 40 — .08 

.22 35 
36 35 43 
.22 se 65 
35 f .65 
11 ‘ 07 | 
— 32 





* Significant at the .05 level. 
** Significant at the .01 level. 


Tt Correlation with other variables corrected for attenuation in the criterion (GPA) only. 


scale (2). The GPA distribution was positively 
skewed and was therefore normalized. 


Results 


The correlation between GPA’s for the two 
semesters was .76. Since the grading prac- 
tice at Santa Barbara College is based largely 
on objective tests, it is felt that this some- 
what low reliability, besides reflecting errors 
of measurement, largely indicates function 
variability within individuals. Since the cri- 
terion (GPA) was based on the average of 
the grades for two semesters, the correlation 
was corrected by the Spearman-Brown for- 
mula to .86. This 7 was then used in the 
correction for attenuation (in GPA only) of 
the correlations between GPA and the other 
variables. Table 2 shows the Pearsonian co- 
efficients among variables, arranged for com- 
putation of regression weights. Four of the 
validity coefficients are significant at the .01 
level, and two at the .05 level. The Pa scale 
of the MMPI and the ACE scores are the 
only variables showing positive correlation 
with GPA, while those for D and Pt ap- 
proach 0. All scales of the MMPI are nega- 
tively correlated with the ACE score. 

Table 3 indicates the beta weights, with } 
weights, and the percentage of contribution of 
each variable to the multiple correlation (R), 
calculated by the formula B23... 10712. The 
6 weights were converted into simplified in- 
tegral weights for use in the derivation of an 
arbitrary composite score. 


The coefficient of multiple correlation de- 
rived from this procedure was .64 when all 
variables were employed. Elimination of the 
Hy and Pt scales of the MMPI resulted in 
an R of .65. These two scales, while not ob- 
viously contributing to R when they were in- 
cluded in the composite, probably contributed 
to multiple prediction by acting as suppressor 
variables. The R corrected for shrinkage 
(cR) was found to be .64, a reduction of 
only .01. 

The coefficient of determination (d) when 
GPA is predicted from ACE alone was found 
to be .23, while the coefficient of multiple de- 
termination (R*) was equal to .41, indicating 


Table 3 


Extracted Weights of Predictive Battery 








Percentage 

Beta Contributed b Integral 
Weight toGPA Weight Weight* 
ACE 4097 19.66 .1476 1 
Hs — .1747 2.62 — .5490 —3 
D — .0125 05 — .0125 0 
Hy 1512 1.20 3818 2 
Pd — .2992 9.57 — .9012 —5 
Pa .2552 3.32 1.0127 5 
Pi 1552 31 3456 2 
Sc — .1984 3.17 — 4592 —2 
Ma — .1595 4.31 — .4253 —2 


Variable 








* Integral weights are for use in an arbitrary selection scale. 
If prediction of GPA is desired, b weights must be used in the 
regression equation. The a constant for the regression equation 
for these data is 79.60. 








Prediction of Academic Achievement by the MMPI 


that the ACE alone accounts for only 23% 
of the variance in GPA, while a weighted 
combination of ACE and MMPI scales can 
account for 41%. The index of forecasting 
efficiency (£) derived from ACE alone is 
12%, while that from the selected battery is 
23%. 

In order to investigate the relationship be- 
tween MMPI scale scores and ACE scores, 
ACE was chosen as the dependent variable 
and MMPI scores as the independent vari- 
ables. Table 4 includes the various statistics 
involved in the prediction of ACE scores from 
weighted composites of MMPI scores. The 
derived cR, using optimal beta weights ap- 
plied to four MMPI scores, was found to be 
25. 

Since there is some speculation as to dif- 
ferences between Quantitative (Q) and Lin- 


Table 4 


Correlational Measures 








ror dor 
Battery cR SE SEest R? E 


res 48" 047 878 23 12% 
037 768 Al 23% 


CR, 23467910 AT 
CRe seno 25+ O58 26.84 .06 3% 








* Corrected for attenuation in the criterion (GPA) only. 
+ Corrected for shrinkage. 


guistic (L) abilities as measured by the re- 
spective subtests of the ACE, scores on these 
variables were correlated separately with 
GPA and the MMPI scales. Q and L scores 
correlated .38 and .39 respectively with GPA, 
indicating that if differences in these abilities 
do exist, they differ only slightly in relation 
to achievement in this population. It is prob- 
able that there exist in each subtest common 
factors that will enable the high scorer in 
either subtest to perform at a high level in 
the academic situation. There was no sig- 
nificant difference between the correlations of 
Q and L with any MMPI scale, so that, con- 
sidering the similarity of correlations with 
GPA noted above, these subtests were not in- 
cluded separately in the correlation matrix 
and in the composite. 


Discussion 


In this study, aptitude as measured by the 
ACE yields the highest zero-order correlation 
with GPA. Since the ACE also may be re- 
garded as a kind of achievement test, it is 
equally subject to the effects of personality 
variables, as evidenced by the negative corre- 
lations with the MMPI scales. It is prob- 
ably for this reason that, when in a weighted 
composite with such personality variables, 
the ACE can account for only 19.7% of the 
variance in GPA. 

The negative correlations between GPA 
and the majority of the MMPI scales may be 
understood by examination of the scale de- 
scriptions (3). Immaturity and lack of in- 
sight (Hs), asocial and disinterested attitudes 
(Pd), existence in a fantasy world (Sc), and 
hypomanic activity (Ma) can scarcely be 
considered as conducive to academic achieve- 
ment. The positive correlation between GPA 
and Pa is less understandable. Possibly the 
individual scoring high on this scale is sus- 
picious and oversensitive within a guarded 
normality and reacts more positively to a 
competitive situation. This study indicates 
that, whether the influence of the individual 
adjustment scales is positive or negative, 
their inclusion in a prediction battery with 
an aptitude test is justified by the resultant 
increase in the coefficient of determination. 

The experimental sample employed in this 
study was relatively homogeneous with re- 
spect to age, sex, and intellectual level. It 
is not expected that use of the weights herein 
derived will accurately predict achievement 
in a more heterogeneous population, espe- 
cially with respect to the sex variable. How- 
ever, the results suggest experimental cross 
validation, and the investigation of the rela- 
tionship of a similar composite to final as well 
as freshman academic achievement. 


Summary 


1. Scores obtained on the ACE and MMPI 
clinical scales (Mf excluded) by 267 fresh- 
man women at the University of California, 
Santa Barbara College, were employed as the 
independent variables in a multiple-correla- 
tion procedure, with grade-point average (cor- 
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rected for errors of measurement) for two 
semesters of the freshman year as the de- 
pendent variable. 

2. The combination of variables yielding 
the highest multiple correlation gave an R of 
.64, corrected for shrinkage. The zero-order 
correlation between aptitude, as measured by 
the ACE, and GPA was .48, corrected for at- 
tenuation in the criterion only. 

3. The coefficient of determination afforded 
by ACE alone was equal to .23, while that 
derived from the selected battery was .41. 
The 12% index of forecasting efficiency af- 
forded by ACE alone was increased to 23% 
by inclusion of certain MMPI scales. 

4. Additional correlations with ACE as the 
dependent variable and MMPI scales as in- 
dependent variables suggest that personality 
factors affect performance in the form of 
ability scores obtained at any given time as 
well as performance over a longer period in 
the form of grades. 

5. Because of the relative homogeneity of 
the experimental population with respect to 


age, sex, and schooling, application of the 
derived weights for the variables is recom- 
mended with reservations to more hetero- 
geneous populations or to male groups. A 
cross-validation study in a similar population 
is planned. 
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References 


. Frick, J. W. The prediction of academic achieve- 
ment in a college population from a test of 
aptitude and a standardized personality in- 
ventory. Unpublished master’s thesis, Univer. 
of Southern California, 1953. 

. Gough, H. G. Diagnostic patterns on the MMPI. 
J. clin. Psychol., 1946, 2, 23-37. 

. Hathaway, S. R., & McKinley, J. C. Manual for 
the MMPI. (Rev. Ed.) New York: Psy- 
chological Corp., 1951. 

. Lough, O. M. Teachers college students and the 
MMPI. J. appl. Psychol., 1946, 30, 241-247. 

. McKinley, J. C., & Hathaway, S. R. A multi- 
phasic personality schedule: II. A differential 
study of hypochondriasis. J. Psychol., 1940, 
10, 255-268. 





The Journal of Applied Psychology 
Vol. 39, No. 1, 1955 
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Preference Record Validation 


Kuder 


John W. Magill 
University of Pittsburgh 


The Kuder Preference Record as a measure 
of interests has wide usage and acceptance. 
It was developed somewhat differently than 
was the Strong Vocational Interest Blank, 
the other of the two most widely used in- 
ventories, in that the Kuder procedure relied 
more upon a priori judgments and internal 
consistency than upon relationships to ex- 
ternal criteria. 

Questions have been asked as to the mean- 
ing of the Kuder scales, and numerous studies 
have appeared concerning the nature and sig- 
nificance of the traits measured. Super (10) 
and Kuder (7) describe the Preference Rec- 
ord and list variables to which Kuder scores 
have been related, e.g., to occupations, in- 
telligence, aptitudes, school grades, person- 
ality, etc. 

Nothing was found in the literature to 
indicate that the external criterion of interests 
manifest by participation in college extracur- 
ricular activities had been used for validating 
the Kuder. While this criterion is subject to 
some of the same vagaries of opportunity, 
status, and social pressure as are those of 
elected majors or occupations, the factor of 
seli-selection seems about as maximal as is 
likely to be found. 


Problem 


The study was framed in the following 
questions: (a) Does the Kuder Preference 
Record, as a measure of interests of college 
students, distinguish between certain activity 
groups? Are there representative profiles for 
the activities? (5) If such profiles can be 
demonstrated, how do they compare on spe- 
cific scales with logically assumed relation- 
ships? For example, do the musical activi- 
ties groups have high musical scores; is the 

1This article is based on part of a Ph.D. dis- 
sertation done under the direction of Professor 
George L. Fahey, University of Pittsburgh. The 


writer is deeply indebted to Dr. Fahey for his 
guidance 


leadership group high on the persuasive scale, 
etc.? 


Procedure 


The various extracurricular groups on the campus 
of the University of Pittsburgh in the academic year 
1951-1952 were surveyed as to the characteristics of 
the activity and the number of participants. Cer- 
tain groups such as the Camera Club, Radio Club, 
and Fine Arts Society were eliminated because the 
samples were too small to be considered statistically. 
Some smaller groups were combined into larger ones 
which could be given general names in keeping with 
the nature of the activity manifested. 

The membership of the groups included freshmen, 
sO6phomores, juniors, and seniors. However, the 
Kuder Preference Record Form BM scores for all 
individuals were secured when they were freshmen 
and were available from the University Testing 
Service files of examinations given to entering fresh- 
men. 

Means and standard deviations of the Kuder raw 
scores were calculated for each of the activity groups. 
Composite means for the combined activity groups 
were also determined; male and female groups were 
kept separate. 

To determine whether the over-all or total profile 
differences between the activity groups were sta- 
tistically significant, a modified three-way analysis 
of variance was made for the males and females 
separately. This technique is described by Block, 
Levine, and McNemar (2) who state that it is an 
over-all test for the existence of psychometric pat- 
terns. The Kuder raw scores were converted to 
stanine scores, meeting the requirement of normal- 
ized standard scores for the procedure. 

Graphs were made for each of the activity groups. 
The Pitt Kuder raw score mean (general Pitt norms) 
for each scale was the zero or reference point, and 
the deviations of the activity group means above or 
below this were shown in terms of tenths of the 
Pitt standard deviations for each scale. Also, ¢ 
tests of statistical significance were made of each 
deviation from the zero or reference point. 


Results 


Table 1 shows the means and standard 
deviations of the Kuder raw scores for the 
different activity groups. Listed also are the 
Pitt Kuder norms as determined by Winn 
(11). 
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Table 1 


Means and Standard Deviations of Kuder Raw Scores 








Activity Group 
(Male) 


N 


1 
Mec 


2 3 
Com Sci 


4 
Per 


5 
Art 


6 
Lit 


7 
Mus 


8 
Soc 


9 
Cle 





Service 


Band 


Vocal 


Publications 


Religious 


(active) 


Religious 
(inactive) 


Dramatics 
Athletics 
Leadership 
Combined groups 


(male) 


Pitt norm group 
(male) 


1426 


42 Mean 
SD 


74 Mean 
SD 


52 Mean 
SD 


78 Mean 
SD 


43 Mean 
SD 


70 Mean 
SD 


50 Mean 
SD 


218 Mean 


SD 


53. Mean 
SD 


680 Mean 


SD 


Mean 
SD 


65.21 
20.86 


69.90 


33.42 
12.02 


68.55 
18.01 


36.53 
12.55 


72.88 
15.90 


30.75 
11.72 


67.38 
18.96 


33.54 
13.21 


68.92 
19.65 


32.46 
12.84 


61.42 
19.62 


35.14 
12.85 


69.93 
14.77 


29.70 
12.17 


58.30 
16.85 


34.25 
12.35 


67.25 
16.02 


34.74 
12.00 


62.94 
19.06 


33.84 
12.64 


66.18 
18.52 


36.58 
11.26 


69.81 
17.53 


74.00 
19.99 


71.53 
18.58 


73.92 
19.94 


79.12 
16.38 


78.65 
19.75 


74.36 
18.71 


88.10 
21.98 


72.69 
18.71 


85.82 
20.47 


‘76.18 


19.77 


76.71 
13.14 


43.78 
15.50 


40.78 
13.22 


42.10 
13.88 


42.32 
12.82 


39.79 
13.70 


43.50 
12.74 


45.80 
15.48 


47.55 
15.30 
39.55 
10.97 


43.90 
14.38 


44.58 
14.00 


48.55 
12.74 


42.88 
16.85 


49.98 
15.10 


58.47 
17.42 


53.80 
17.97 


45.36 
13.91 


56.10 
16.29 


46.32 
12.33 


53.51 
16.18 


49.40 
16.44 


49.68 





Activity Group 
(Female) 





Publications 


Dramatics 


Vocal 


Leadership 


Combined groups 


(female) 


Pitt norm group 
(female) 


40 Mean 
SD 


57 Mean 
SD 


56 Mean 
SD 


50 Mean 
SD 


203 Mean 
SD 
371 Mean 
SD 


26.38 
14.88 


45.12 
16.30 


23.14 
12.07 


48.58 
16.39 


26.28 
12.62 


$1.11 
17.58 


24.50 
11.46 


53.00 
16.25 


24.98 
12.75 


49.68 
16.90 


26.38 
12.18 


50.18 
20.00 


72.88 
14.78 


80.07 
16.83 


70.30 
17.07 


73.10 
18.50 


74.24 
17.37 


72.62 
16.62 


53.12 
15.55 


52.18 
13.83 


50.39 
15.96 


49.30 
16.38 


51.16 
15.48 


49.94 
15.49 


62.38 


19.85 


60.51 
14.54 


54.86 
15.12 


56.60 
16.15 . 


58.35 
16.52 


56.55 
16.79 


19.43 
10.05 


32.30 
8.28 


28.11 
8.71 


20.58 
9.85 


20.26 
9.62 


18.53 
9.28 


20.98 
8.64 


17.21 
8.96 


20.87 
10.35 


21.12 
10.40 


19.68 


26.42 


8.92 


25.84 
8.32 


30.84 
8.59 
25.18 
8.54 


27.17 
8.88 


25.94 
8.54 





77.36 
24.62 


70.31 
20.73 


77.38 
19.15 


69.88 
20.80 


81.01 
26.66 


74.64 
23.02 


77.70 


19.23 


73.86 
18.95 


76.01 
21.14 


74.48 
21.16 


70.71 


19.42 


88.25 
17.98 


89.24 
19.66 


86.64 


49.86 
15.89 


47.88 
13.79 


46.42 
13.96 


13.09 


48.72 
14.51 


51.00 
16.00 


48.23 
13.10 


51.62 
14.24 


48.60 
14.28 


50.68 
14.16 


54.12 
15.45 


50.77 


17.42 


51.46 
18.44 


52.00 
15.88 


51.93 
16.80 


54.30 





Interest Profiles of College Groups 








ESSEUSESy LET: Ssexs ¥SssSs “ 





42345678? C2 t eer sr ey: 4a@3seteves 


SSR G&G 


‘ 
yy SN 


+680 we-2i’ 
Y SIALE COVIBINED GROUPS ATHLETICS 


26 





ge 
eo Mog oe ae ae 423¢456789 PUBLICATIONS 


ose FRIAS E VSS 





ae 
&’A ~ 8 CHK & @GH A 











aay - ee eee | 
(ELECTED STUDENT OFFKERS) we43 N=70 
ORAIIATICS LEADERSHIP “7 RELIGIOUS (acTIvE)-YTICA” RELIGIOUS omacTive}-YINCA 























Activity groups Kuder scale means related in tenths of standard deviations to Pitt Kuder means 
(male). (* Significant at 1% level; ** significant at 5% level.) 


BUSTISTES | EESEHS TY 








~ Alec 
in icow 
| 5C/ 
*|Aa 
& |AaT 
“4/7 
~ |S 


| Oe | SOE 
 |CLE 


=203 PT moge ..._| | 
NE, PUBLICATIONS : = 
congener mie a ua 
BAND) (HALE) VOCAL (MaLe) 


PERO wres Be 


Qieeke 8S La &AXG 











aVveares ¢ 4/23 ¢5 OC7ESP 








Bit rst SPS Oy Tee 
| W=50 
wear) | terre * HECTE 

QRASIATICS ~ a, eo LEADERS waobenmay 




















Fic. 2. Activity groups Kuder scale means re- Fic. 3. Activity groups Kuder scale means re- 
lated in tenths of standard deviations to Pitt Kuder lated in tenths of standard deviations to Pitt Kuder 
means (female). (* Significant at 1% level; ** sig- means. (* Significant at 1% level; ** significant at 


nificant at 5% level.) 5% level.) 





56 J. W. Magill 


The complex analysis of variance pro- 
cedure as applied to the nine male activity 
groups yielded an F ratio of 5.06 which was 
considerably above the 1.5 required for the 
1% confidence level. For the four female 
activities, the F ratio was 1.64 which indi- 
cated significance at the 5% level of confi- 
dence, since 1.8 and 1.5 were required for the 
1% and 5% levels. 

The results of this statistical procedure, 
which treated differences in profiles as con- 
tributed by all nine Kuder scales, indicated 
that the Kuder Preference Record did dis- 
tinguish between the activity groups. The 
results were highly significant for the males 
but only moderately so for the females. 

The profiles of the activity groups are 
shown in Fig. 1 to 3. An inspection of these 
showed considerable agreement between the 
characteristic group and similarly named 
Kuder scales showing high. 

The band, as well as both male and female 
vocal groups, showed very high on the musical 
scale. The male leadership group was high 
on the persuasive scale; the male publications 
group reflected high literary interest; the 
male dramatics activity showed high in liter- 
ary and low in mechanical, computational, 
and scientific. As indicated by the single 
asterisk, these deviations were all significant 
at the 1% level of confidence. At the 5% 
level, the religious groups (YMCA) placed 
above the mean in social service, and, on the 
persuasive scale, the active members were 
higher than the inactive. 

The profiles of the female groups were not 
so marked as those for the males in reflect- 
ing assumed interests. Besides the vocal 
group which showed high in musical interest, 
the high persuasive interest of the dramatics 
group was the only other significant devia- 
tion. 

In this study, it may have been worth while 
to have investigated further the nature of the 
female population so that questions might be 
answered as to why the female profiles were 
less distinctive and why the dramatics group 
showed highest on the persuasive scale rather 
than highest in literary as did the male dra- 
matics group. 


Summary 


Kuder Preference Record Form BM inter- 
est profiles were drawn for nine male and 
four female extracurricular groups at the 
University of Pittsburgh. The subjects were 
603 males and 203 female freshmen, sopho- 
mores, juniors, and seniors. Their Kuder 
scores were secured from the freshman test 
files. 

A complex analysis of variance treatment 
of the data indicated statistically significant 
differences between the group profiles. These 
were at the 1% confidence level for the male 
groups and at the 5% level for the female 
activities. 

Inspection of the profiles revealed that cer- 
tain interests, rationally assumed to be mani- 
fest from the names and character of the ac- 
tivities, were generally well reflected by simi- 
larly named Kuder scales. The findings were 
much less distinct for the female groups. 


Received February 15, 1954. 
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The Selection of Drafting Trainees * 


Mervyn William Perrine” 


University of Connecticut 


Although draftsmen are currently at a pre- 
mium and numerous companies have estab- 
lished drafting training programs, there ap- 
pears to be little information concerning the 
selection devices which can be used for this 
occupation. The present study was done for 
a large Connecticut manufacturer who wished 
to establish a seven-week vestibule training 
program for draftsmen. The testing program 
which was developed as a result of this train- 
ing program was based on the analysis of two 
independent groups. 

The major stages of the investigation were: 
(a) the development of criteria for draftsmen 
already employed; (4) the admjnistration of 
a battery of tests believed to be of predictive 
value; (c) the use of the most promising tests 
with the employed draftsmen to select trainees 
for the vestibule school; (d) follow-up of the 
trainees in the training program; and (e) 
follow-up of the vestibule school graduates 
who were hired as draftsmen. 


Method 


Subjects. The two groups studied were: em- 
ployed draftsmen and applicants for the vestibule 
training school. The second group was studied at 
three different selective stages. 

The first group consisted of 20 draftsmen with 
job experience ranging from 3 months to 24 years. 
All of them were members of the same department. 
Participation in the program was not mandatory, 
yet only one draftsman refused. These Ss had from 
11 to 14 years of schooling. In terms of job classifi- 
cation, there were five senior draftsmen, six design 
draftsmen, eight detail draftsmen, and one junior 
draftsman. There was only one woman in this 
group. 

A group of approximately 55 applicants was 
screened by the company’s usual procedures, but 
only 36 members of this group were accepted for 
testing and further consideration.* At the first se- 


1 This article was written in partial fulfillment of 
the requirements of an honors program at the Uni- 
versity of Connecticut. 

2.Now at Princeton University and, during 1954- 
1955, at the University of Amsterdam. 

8 Qne member of this group (WN = 36) had a par- 
tial language barrier and thus could not be given 
the complete battery. His scores are not included 


lective stage for the second group of Ss, there were 
these 36 applicants, eight of whom were women. 
Their education ranged from 12 through 15 years of 
schooling. The preceding month, 22 of these appli- 
cants were graduated from high school and 3 others 
had left college, although they had not graduated. 


None of the applicants had had any previous experi- 


ence as a draftsman, although many of them had 
completed high school courses in mechanical drawing. 

Completion of the seven-week training school 
marked the second stage of selection. Twenty-six 
trainees were graduated, including five women. 

At the third stage, there were 11 of the graduating 
trainees who were hired as draftsmen. They were 
all males and all had been enrolled in high school or 
college the month preceding selection for the train- 
ing program. These 11 trainees were selected on the 
basis of their final grades in drawing, the independ- 
ent appraisal of their drawings by two drafting su- 
pervisors, and, to a much smaller degree, their final 
grades in mathematics. 

Criteria. The criterion of success on the job was 
obtained by having the 20 employed draftsmen rank 
each other, and by having them ranked by their 
four supervisors, in terms of general job competence. 
The method may be called “alternation ranking” 
(3), in that the ranks were assigned in the pattern: 
highest (in the group), lowest, next highest, next 
lowest, etc. Each ranker evaluated only those drafts- 
men whom he knew well enough to judge. As all 
of the judges did not rank the same number of 
draftsmen, it was necessary to give each set of rank- 
ings an equivalent meaning by placing them on a 
common scale. A table for conversion of ranks into 
normalized scores prepared by Larson (2, pp. 491- 
492) was used for this purpose. The standardized 
ranking scores for each draftsman were placed in 
two groups: (a) the ranks assigned by his super- 
visors, and (b) the ranks assigned by his co-work- 
ers. Each draftsman received a Supervisor Evalua- 
tion Score (SES) and a Co-Worker Evaluation Score 
(CES). These were the means of his standardized 
ranking scores in the two respective groups. The 
mean of the SES’s was 28.86 (SD = 9.43), and that 
for the CES’s was 28.55 (SD = 7.11). 

The reliability coefficient of the ranking criterion 
was determined by correlating the SES’s with the 
CES’s. An r of .92 was obtained. As all of the 20 
draftsmen and their supervisors were members of 
the same department and had been working to- 
gether for some time, the high degree of agreement 
between the ranks assigned by the supervisors and 
by the workers indicates a high consistency in rank- 





in the data. Hence, there may seem to be an oc- 
casional discrepancy in N. 
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ing. The SES’s were used as the criterion of job 
proficiency. 

The criterion of success in the vestibule school was 
based on the final grades in drawing and in mathe- 
matics. There were two instructors in the school, 
one for each subject, and approximately the same 
amount of course time was devoted to each of these 
subjects. The weekly grades for each trainee were 
found to be highly consistent for each subject. The 
r between the final grades in drawing and in mathe- 
matics was .91. 

The criterion of success on the job for the 11 
trainees hired as draftsmen was. determined by a 
follow-up study in which the trainees were rated on 
ten traits. The continuum for each trait had five 
descriptive points. The positive ends of the con- 
tinua were alternated on the rating form to encour- 
age the judges to consider each trait carefully and 
individually. These trainees were rated by their 
drafting supervisors after three to six months on 
the job. 

The tests. Various drafting and engineering su- 
pervisors were questioned concerning the areas in 
which they thought drafting school applicants should 
be tested. After a survey of the literature and an 
appraisal of various tests available, it was felt that 
the following tests would be promising as a pre- 
liminary battery: 


1. Differential Aptitudes Test 
Reasoning, Form B. 

. DAT, Clerical Speed and Accuracy, Form B. 

3. DAT, Mechanical Reasoning, Form B. 

. DAT, Space Relations, Form B. 

. Kuder Preference Record, Vocational, Form CH. 

6. Revised Minnesota Paper Form Board (MPFB), 
Form B. 

7. Richardson, Bellows, 
Learning Ability, Form S. 


(DAT), Abstract 


Henry & Co., Test of 
(A 15-minute condensa- 


Table 1 


Distribution of Test Scores for 20 Employed 
Draftsmen and Correlations Between 
Tests and Criterion 








Test Mean 


39.4 
28.2 





AGCT 
DAT, Abstract Reasoning 
DAT, Clerical Speed and 
Accuracy 
DAT, Mechanical Reasoning 
DAT, Space Relations 
Kuder, Artistic 
Kuder, Mechanical 50.5 
MPFB 45.5 
WPCT, Numerical 10.1 
WPCT, Verbal 19.3 


* Significant at the 5% level. 
** Significant at the 1% level. 


67.3 
44.9 
49.8 
35.3 





tion of the civilian version of the AGCT and re- 
ferred to hereafter as the AGCT.) 

8. Wesman Personnel Classification Test (WPCT), 
Form B. 

The tests were administered in conformance with 
the time limits and instructions given in each test 
manual. This preliminary battery was given to the 
employed draftsmen in a single 4.5-hour period. 


Results 


Some of the correlations were promising for 
selection, as can be seen by inspection of 
Table 1, which shows the correlations be- 
tween the SES ranking criterion and the 
eight tests used with the employed drafts- 
men. The tests given to the applicants were 
selected principally on the basis of the cor- 
relations with the ranking criterion, although 
the AGCT and WPCT were included in order 
to investigate them further. The battery for 
the applicants consisted of the following tests: 
AGCT, DAT Space Relations, Kuder, and 
WPCT. Only the Mechanical and Artistic 
Interest sections of the Kuder were consid- 
ered in processing the applicants’ test results. 

Table 2 shows the correlations between the 
standard scores of the training school final 
grades and the test scores of the 26 graduat- 
ing trainees. The order of merit in the 
graduating class was based exclusively on the 
final grades. Each trainee was assigned an 
order of merit in drawing and in mathematics. 
The orders of merit were transmuted into 
standardized scores using the method pre- 
sented by Garrett (1, pp. 172-173). The r’s 
are shown for standard scores in drawing and 
in mathematics and for the average of the 
standard scores in drawing and mathematics. 

Table 2 also shows the means and stand- 
ard deviations of test scores for the trainees 
at the three different selective stages. Two 
tests, the DAT Space Relations and the 
Kuder Mechanical, are significantly corre- 
lated with the criterion for both the em- 
ployed draftsmen and the trainees. Examina- 
tion of Table 2 indicates that for all six tests 
there is an increase in the mean and a de- 
crease in the variability as the less promising 
trainees are eliminated from the program. 
The effect of selection on the means and 
standard deviations is most pronounced for 
the DAT Space Relations and Kuder Me- 





Test 
AGCT 





DAT, Space Relations 


Kuder, Artistic 


Kuder, Mechanical 


WPCT, Numerical 


WPCT, Verbal 


Course 
Drawing 


Mathematics 


* Significant at the 5% level. 
** Significant at the 1% level. 


tN =25. 


chanical. 
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Table 2 


Distribution of Test Scores for Applicants and Correlations Between Standardized 
Drawing and Mathematics Scores 








Stage of Study 





First 
(N=35) 


Second 
(N=25) 


Third Average 
(N=11) Math. (D.&M.) 


Drawing 





39.1 

5.8 
51.8 
25.9 


36.4 
10.5 


42.8 
14.3 


9.4 
3.4 


21.5 
8.9 


41.0 
5.3 


63.5 
21.4 


39.6 
10.0 


46.8 
11.6 


10.5 
3.0 


22.3 
8.4 


50.1 
18.0 


50.1 
18.0 


Since test scores were not used as 


a basis for termination, this relationship is a 
further indication of the validity of the tests 
for the training program., The effect of se- 
lection on the means and standard deviations 
of the drawing grades is more pronounced 
than on those of the mathematics grades, but 
this is because the 11 trainees who were 
hired were selected principally on the basis 
of their drawing grades. 

The ten traits on the rating scale for the 
11 trainees who were hired as draftsmen were 
weighted equally. A composite rating score 
for each of these trainees was obtained by 
averaging his ratings on the ten traits. None 
of the correlations between these composite 
rating scores and the test scores was sig- 
nificant. The r’s were: AGCT .23; DAT 
Space Relations — .18; Kuder Artistic Inter- 
est — .24; Kuder Mechanical Interest .09; 
WPCT, Verbal .17; and WPCT, Numerical 


41.3 56** 69** oot 


4.8 


mig 


46* 


34 





— .02. As these 11 trainees were of highly 
restricted range, having undergone four dif- 
ferent selection processes, it would be difficult 
to find a measuring instrument that would 
differentiate between them. The more stable 
alternation ranking method could not have 
been used in the follow-up study because the 
11 trainees were divided among three depart- 
ments. 

To study the effectiveness of the various 
tests, arbitrary cutting scores 1 SD below the 
means of the employed draftsmen were used. 
The effects of these cutting scores are shown 
in Table 3. The DAT Space Relations and 
the Kuder Artistic are in general the two 
most stable tests for the two independent 
groups, the employed draftsmen and the 
trainees. 

A further application of the same cutting 
scores showed that, of the 11 trainees hired, 
7 were above the cutting scores on all six 
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Table 3 


Percentage of Group Hypothetically Rejected by a 
Cutting Score 1 SD Below Mean of 
Employed Draftsmen 








Drafts- 
men 
Above 
Mean 
on Cri- 
terion 
(N=8) 


33% 


Drafts- 
men 
Below 
Mean 
on Cri- 
terion 
(N=12) 


17% 


Trainees Trainees 
Released Hired 
(N=24) (N=11) 


13% 0% 


Test 


AGCT 
DAT, Space 
Relations 33 13 21 9 
Kuder, 
Artistic 33 0 33 0 
Kuder, 
Mechanical a 14 63 
WPCT, 
Numerical 33 21 
WPCT, 
Verbal : 33 33 








tests, while 1 was below on the DAT Space 
Relations and 3 were below on the Kuder 
Mechanical. On the same basis, 4 of the 
trainees who were not hired as draftsmen 
were above the cutting scores on all six tests, 
while 20 of these trainees were below on at 
least one of the six tests. The requirement 
that the applicant be above the cutting score 
on all six tests would have eliminated 36.4% 


of the “successes” and 83.3% of the “fail- 


ures.” 

In an attempt to determine the material 
value of the testing program, the total costs 
were obtained and analyzed. Two separate 
analyses were made initially; the first in- 
cluded the “burden,” or overhead factor 
which is attached to the labor costs in all de- 
partments, and the second excluded the over- 
head factor. The final analysis, however, 
was based on the assumption that the burden 
factor would be an intrinsic part of the cost 
of any subsequent drafting training programs 
and, as such, should be included in the ap- 
praisal of the material value of the present 
program. The burden factor used for the de- 
partments studied was 115%, and all of the 
following figures except supplies include this 
factor. 

The total cost of the training program was 


$31,991, which was distributed as follows: 
$1,450 for the testing program research and 
administration; $946 for training supplies, 
etc.; $18,514 for the trainees’ wages; and 
$11,081 for instruction and the salaries of 
company employees connected with the train- 
ing program. On the basis of the above fig- 
ures, it cost the company $1,865 to obtain 
each draftsman in the present training pro- 
gram, of which the training school wages 
were only $640, while fixed costs and the 
other salaries accounted for the remaining 
$1,225. 

Solely for the purpose of illustration, let us 
analyze the possible efficiency of the selection 
program. After the 36 applicants had been 
tested, they were rated in terms of their prob- 
able qualifications for drafting on the basis 
of their test battery scores. As a result of 
these ratings, three applicants whose test 
scores were extremely low were not hired as 
trainees. If we assume that $1,865 would be 
a relatively constant cost for training each 
draftsman regardless of slight enrollment 


variations, the company saved $3,675 exclud- 
ing training school wages, or $5,595 including 
wages by not hiring these three applicants. 


Of the 12 trainees who were rated as “ques- 
tionably qualified” on the basis of test scores, 
only one was ultimately hired as a drafts- 
man. If these 12 trainees had not been 
accepted for training, the company would 
have saved $14,700 excluding their wages, or 
$22,380 including their wages, but would only 
have received 10 draftsmen from the program 
instead of 11. To push the analysis a bit 
further while retaining the same basic as- 
sumptions (although changes in the number 
of draftsmen produced by the vestibule school 
would alter the cost distribution figures), the 
company would have saved $29,400 exclud- 
ing wages, or $44,670 including wages, if the 
arbitrary cutting scores of — 1 SD had been 
used on all six tests. In this case, 24 of the 
trainees would not have been accepted for 
training, 7 would have been trained and hired 
as draftsmen, and 4 would have been trained, 
but not hired as draftsmen. 

If there were an abundance of applicants, 
the above method and cutting scores would 
be quite suitable, but it would be a rather 
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severe technique if labor were scarce. Using 
the above figures and cutting scores, 55 ap- 
plicants would have to have been tested to 
have obtained 11 trainees who would be hired 
as draftsmen. It will be noted that only 55 
individuals applied initially, of whom 19 
were rejected by the company’s usual screen- 
ing procedures. Therefore, 89 individuals 
would have to have applied initially to have 
yielded the 11 trainees hired as draftsmen. 


Summary 


Twenty employed draftsmen were given a 
battery of eight tests and the correlations of 
their scores with a ranking criterion were 
used to establish a battery of four tests which 
was given to 36 applicants for a seven-week 
drafting vestibule school. A follow-up was 
made of the 33 trainees selected for the 
training program, and the validities of the 
tests were determined by using as criteria the 
final grades of the 26 trainees who completed 
the school. A further follow-up was made of 
the 11 trainees who were subsequently hired 
as draftsmen, and the validities of the tests 
were determined by using as a criterion a rat- 
ing index of their job performance after six 
months employment. The material value of 
the testing program was estimated by analyz- 
ing the costs of the training program. From 
an analysis of the above data, it is concluded 
that: 

1. A significant, positive relationship ap- 
pears to exist between the DAT Space Rela- 
tions and general drafting competence, as well 


as with training school standard scores in 
drawing. As the DAT Space Relations seems 
to be quite stable for the two independent 
groups, it should be useful in personnel se- 
lection and prediction for drafting, even 
though it was normalized on elementary and 
secondary school students. 

2. A less stable, but positive relationship 
appears to exist between the Artistic and Me- 
chanical Interest sections of the Kuder and 
drafting ability, as measured by general job 
competence and by training school final 
grades. 

3. A significant, positive relationship seems 
to exist between the 15-minute civilian ver- 
sion of the AGCT and the training school 
final grades in both drawing and mathematics. 

4. A significant, positive relationship seems 
to exist between the training school final 
grades in mathematics and both the Verbal 
and Numerical sections of the WPCT. 

5. The requirement that an applicant se- 
lected for training score above the cutting 
point on all six tests would have eliminated 
83.3% of the applicants who were ultimately 
rejected or released and 36.4% of the trainees 
who were hired as draftsmen. 
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Are Medical Specialist Interest Scales Applicable to 
Negroes? * 


Edward K. Strong, Jr. 


Stanford University 


Hartshorn (1) concluded that Negro phy- 
sicians, lawyers, and life insurance agents 
differ in their interests from corresponding 
groups of white men. 

This writer (2) reviewed Hartshorn’s thesis 
in the light of his own data and suggested 
four possible explanations for Hartshorn’s 
conclusion. The first explanation was that 
his three Negro samples were not equivalent 
to the criterion groups upon which the scales 
were established. On the basis of available 
data it seemed evident that his sample of life 
insurance men was not equivalent to our cri- 
terion group with which it was contrasted. 
Whether or not the corresponding samples of 
Negro and white physicians and lawyers are 
equivalent cannot be determined for lack of 
information regarding them. But until it is 
established that equivalent samples of the 
two racial groups differ in their interests, 
Hartshorn’s conclusions must be held in 
abeyance. 

This report is an attempt to clarify the 
issue. The 60 Negro and 150 white men who 
are compared here are medical college seniors. 
In this respect they are equivalent. But it is 
not known whether they are equivalent or not 
as regards intelligence, scholastic achievement, 
and future medical performance. They do 
afford a basis for answering the practical 
question: are the medical specialist interest 
scales applicable to Negroes? 

The 150 white students are a representa- 
tive sample of a group of 750 students from 
15 medical colleges (3). The 60 Negro medi- 
cal school seniors are all from Howard Uni- 
versity and were secured through the coopera- 
tion of Vice-Dean Robert S. Jason. The plan 


1 This article was part of the final report to the 
Office of the Surgeon General on Predictions of Suc- 
cess in Medical Residency Training, June 30, 1953. 
The opinions expressed in the article are those of the 
author and not necessarily those of the Department 
of the Army. 


had been to obtain 150 Negro records but this 
number proved to be unattainable. 

Both groups of seniors filled out the Voca- 
tional Interest Blank and the Medical Spe- 
cialist Preference Blank. The blanks were 
scored on 19 interest scales. Mean scores are 
reported in Table 1. The Negroes scored 
higher on 11 scales, the whites on eight scales. 
On nine of the scales the means of the two 
groups do not differ significantly, i.e., at the 
5 per cent level. 

On the 14 occupational scales Negroes 
scored significantly higher than white stu- 
dents on the office man, mathematics-science 
high school teacher, personnel manager, and 
public administrator scales and lower on the 
occupational level, author, and revised phy- 
sician scales. 

On the specialization level scale, indicative 
of whether a medical student should special- 
ize or not, the Negro scored slightly higher 
than the white students. On the four medi- 
cal specialist scales, the Negroes scored higher 
on the psychiatrist and pathologist scales and 
lower on the internist and especially on the 
surgeon scale. 

All in all, the differences in interest scores 
of the two racial groups are small. The dif- 
ferences could be explained by the hypothe- 
sis that a few more Negro than white seniors 
do not have the interests of physicians. In 
other words, if selection of applicants were 
based solely on physician interest scores, ten 
more whites per hundred would be selected 
than Negroes. - 

The medical specialist interest scales were 
devised as an aid in the selection and guid- 
ance of medical school students as to whether 
or not they should specialize, and, if so, in 
what specialty they should engage. It is 
therefore most appropriate to ask: Do these 
two racial groups differ in the percentage who 
should specialize; also do they differ in the 








Are Medical Interest Scales Applicable to Negroes? 


Table 1 
Interest Scores of Negro and White Medical College Students 








Negro Seniors 
N=00) 


Scales Mean SD 


White Seniors 
(N= 150) 


Difference Percentage 





Occupational 
Office man 
Math.-science teacher 
Personnel manager 
Public administrator 
Masculinity-femininity 
Psychologist (rev. ) 
Psychiatrist 
Life insurance 
Aviator 
Osteopath 
Chemist 
Physician (rev.) 
Author 
Occupational level 
Specialization 
Specialization level 
Psychiatrist 
Pathologist 
Internist 
Surgeon 


36.3 9.7 
45.7 8.8 
37.4 

42.3 8.8 
48.3 8.8 
36.9 

43.7 8.5 
28.2 9.4 


Mean SD in Means Overlap 


27.9 8.8 8.4** 65 
40.6 10.8 ee 79 
32.2 10.8 + ges 81 
39.3 8.8 3.0* 86 
47.3 9.7 1.0 95 
36.2 0.7 97 
42.9 9.8 0.7 97 
27.6 9.7 0.5 98 
38.0 —14 — 96 
46.1 8.7 —2.3 —90 
36.4 —3.1 —89 
51.3 —5.1** — 80 
31.6 —4,7** —78 
54.8 6.7 —5.1** —42 
42.4 10.5 88 
38.0 10.1 87 
26.8 12.1 j 92 
34.2 10.7 

11.1 —82 





* Significant at 5 per cent level. 
** Significant at 1 per cent level. 


percentage who should specialize in each of 
the four specialties? 

Only after a follow-up of medical school 
students has been made will it be possible to 
determine the best way to answer the above 
questions in terms of interest scores. In the 
meantime two different standards may be em- 
ployed on the basis of which the two racial 


Table 2 


Percentage Having Interests to Specialize Based on 
A Ratings on Physician Scale, Specialization 
Level Scale, and on One of the 
Specialist Scales 


Type of Specialist 





Sur- 
geon 


Intern- 
Group ist 


Pathol- Psychi- 
ogist atrist Total 





White medical 
seniors 6 k : 11 25 
Negro medical 
seniors ’ 12 23 


groups may be compared. The first standard 
requires that the student shall have an A 
rating on the physician interest scale, on the 
specialization level scale, and on one of the 
four specialist scrles. The second standard 
requires an A cr b ~- rating on these scales. 

There is no practical difference between 
white and Negro medical students on the 


Table 3 
Percentage Having Interests to Specialize Based on 
A or B+ Ratings on Physician Scale, Spe- 
cialization Level Scale, and on One 
of the Specialist Scales 


Type of Specialist 





Sur- 
geon 


Intern- Pathol- Psychi- 

ogist atrist Total 
White medical 

seniors 15 10 6 22 53 
Negro medical 


seniors : 10 21 43 
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basis of the first standard. (See Table 2.) 
On the basis of the second standard, 53 per 
cent of white students should specialize in 
contrast to 43 per cent of Negro students. 
(See Table 3.) 


Summary 


Interest scores of 60 Negro seniors from 
Howard University Medical College are con- 
trasted with the scores of 150 white seniors 
who are a representative sample of 750 stu- 
dents from 15 medical colleges. The differ- 
ences between the two racial groups are 
small, taking everything into account. The 
differences could be explained on the basis 
that ten more white medical students per 
hundred have the interests of physicians than 


Negro medical students. It is quite possible 
that white students from two different medi- 
cal colleges would differ as much. 

There is no warrant for assuming that the 
medical specialist interest scales are not ap- 
plicable to Negroes, 
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in the School of Business Administration 
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In a previous study (1) the responses of 
students in the School of Business Adminis- 
tration were found to differ significantly from 
the responses of business administrators on 
20 of the 40 items contained in an attitude 
questionnaire. The purpose of this study 
was to determine the extent to which atti- 
tudes of students differed from those of busi- 
ness employees when measured by the same 
questionnaire that was completed previously 
by administrators. 


Procedure 


Members of a seminar distributed questionnaires 
containing 40 statements to 49 business employees 
and to 146 business school students. Respondents 
were forced to reply either yes or no to each state- 
ment. Five statements dealt with unionism, 10 with 
government control, 15 with personnel policy, 5 
with profit distribution, 4 with the free enterprise 
system, and 1 with the desirability of business train- 
ing on the college level. The 40 statements used in 
the questionnaire were selected by seminar members 
from a pool of about 200 statements. With the ex- 
ception of three or four statements which were sug- 
gested by outside sources, all statements were con- 
tributed by seminar members. Clarity, conciseness, 
and coverage of specific areas determined the selec- 
tion of each statement. 

Business employees were selected on the basis of 
the size of the establishment where they worked and 
their willingness to cooperate. The largest organiza- 
tion in the general vicinity of a seminar member's 
home was contacted first. None of the employees 
performed administrative or managerial functions. 
In no case did any of the employees supervise any 
other employee. All of the employees were salaried 
personnel. Twenty-six per cent earned $4,000 or 
more per year. Approximately 57% of thc sample 
had completed from one to four years of college 
training, whereas only 20% had not graduated from 
high school. Moreover, about 70% of the sample 
was composed of persons aged 30 years or older, 
and a similar percentage had worked with their 
present organizations for three years or longer. Of 
the organizations represented in the sample, 18% 
were located in towns of less than 5,000 population 
and 70% employed less than 25 employees. In all 
cases the firms were located in Mississippi and were 
predominantly engaged in manufacturing or process- 
ing activities. 


The student sample was composed entirely of Mis- 
sissippi State College students enrolled in the School 
of Business and Industry. Approximately 60% of 
the students were upperclassmen. As in the case of 
the employees, the student sample is self-selected to 
the extent that it is composed entirely of students 
who were willing to cooperate with the survey. 


Results 


Significant differences between responses of 
business employees and students of business 
administration were found on 6 of the 40 
statements contained in the questionnaire. 
Specific details are found in Table 1. The 
statements are numbered in accordance with 
their appearance in the questionnaire. The 
significance of the difference between percent- 
ages was estimated from the Lawshe and 
Baker nomograph (2). 


Table 1 


Distribution of Responses to Items in Questionnaire 





Per Cent Replying 
Yes 


Stu- Em- 
dent ployee 


1. Business should receive gov- 
ernment subsidies 34 20 14* 
13. Labor unions help industrial 
progress : 61 14* 
21. The Negro should receive the 
same wages as the white per- 
son for doing the same job 
. A supervisor should work 
nimself up from the ranks 
. You can tell a person’s in- 
telligence by interviewing 
him 
. Profits resulting from in- 
creased productivity should 
be divided equally among 
stockholders, labor, and the 
consumer 28 


Item Diff. 





* Significant at the 5% level of confidence. 
** Significant at the 1% level of confidence. 
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It is noteworthy that the greatest diver- 
gence of opinion between students and em- 
ployees occurred on the issue of profit dis- 
tribution. Control questions indicated that 
the largest percentage (58%) of students 
agreed that the major portion of profits re- 
sulting from increased productivity should go 
to the stockholders, while the smallest per- 
centage (38%) of students agreed that the 
major portion should go to labor. Conversely, 
the employees professed their greatest per- 
centage of agreement (53%) in allowing the 
major portion of profits to go to the con- 
sumer. The smallest percentage (43%) of 
employees agreed that the major portion of 
profits should go to labor. Upon further 
analysis it appears that the maximum di- 
vergence between attitudinal responses of stu- 
dents and employees revolves about the as- 
pect of equality of profit distribution among 
consumers, stockholders, and labor. Em- 
ployees are predominantly in favor of equal 
distribution. The students, on the other 
hand, view equal distribution with consider- 
able disfavor. 


Discussion 


Results of this study indicate that the stu- 
dents’ attitudes resemble those of business 
employees to a markedly greater degree than 
they resemble those of the administrators 
who were sampled in an earlier study (1). 
Business administrators and students differed 
most notably on issues of government con- 
trol. No such importance is attached to gov- 
ernment control by business employees. 

Conversely, students and administrators are 
generally agreed that a Negro should receive 
the same wages as a white person for doing 
the same job. Employees, however, are less 
favorably disposed toward Negro and white 
equality under the same circumstances. More- 


over, a significantly greater percentage of em- 
ployees than students professed to believe that 
a supervisor should work himself up from the 
ranks. Although administrators are also more 
favorably disposed than students to super- 
visors rising from the ranks, the difference 
between responses of students and adminis- 
trators was not statistically significant. 


Summary 


An attitude survey blank containing 40 
statements and covering the areas of govern- 
ment control, personnel policy, profit dis- 
tribution, unionism, and the free enterprise 
system was completed by 49 business em- 
ployees and 146 business administration stu- 
dents. 

1. Significant differences between responses 
of the two groups were found on 6 of the 40 
statements. 

2. Disagreement was greatest in the area 
of equality of profit distribution. Students 
regarded equality of distribution with consid- 
erable disfavor. 

3. When results of this study are com- 
pared with those of an earlier study, it be- 
comes evident that the divergence between 
attitudinal responses of students and em- 
ployees is markedly less than the divergence 
between students and business administrators. 
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Williams, Roger J. Free and unequal: the 
biological basis of individual liberty. Aus- 
tin: Univer. of Texas Press, 1953. Pp. 
177. $3.50. 

Whatever our American revolutionary 
founding fathers may have meant by assert- 
ing that all men are by nature “free and 
equal”—from the context of the time they 
probably meant “equally free”—it is certain 
today that human babies are by no means 
assembly-line duplicates. It would be truer 
to say that every human individual, aside 
from identical twins, is genetically unique. 
Each has his own pattern of hereditary traits. 
The traits most accurately studied by the 
geneticists, such as hair color and finger 
prints, may create the illusion that heredity 
is unimportant in human social life. But 
there are genetic differences also in the 
senses, the muscles, the endocrine glands, 
respiration and circulation, brain structure 
and “brain waves.” These biological differ- 
ences are bound to make people differ in 
their emotions, preferences, ways of thinking, 


practical aptitudes, and behavior generally. 
If people were all alike to start with, there 
would be no need for personal liberty. They 
could be assigned their social roles by some 
higher authority and trained to fill these roles 


efficiently and happily. But given the un- 
doubted fact of great human variability, what 
is needed to insure a good social order is 
individual freedom. “There is no middle 
ground: distinctiveness, individual worth, and 
freedom rise or fall together.” Such is the 
author’s thesis, -elaborated with great force 
and persuasiveness. 

An eminent chemist, the author has con- 
tributed to extend the study of genetics into 
the significant field of biochemistry. As a 
teacher and a nutritionist, he is by no means 
neglectful of the importance of environmen- 
tal factors in development and well-being. 
But he insists that nutrition and education 
should be adjusted to the needs of each indi- 
vidual. The author detects a current trend 
toward assembly-line methods and away from 


due recognition of individuality. To the law- 
maker or administrator regimentation seems 
the line of least resistance, but it leads to 
failure if individual preferences and aptitudes 
are very different as they are likely to be. 
In an interesting experiment or “game” the 
author offered about fifty kinds of enjoyment 
(such as alcohol, athletics, collecting, scenery, 
conversation, gardening, aiding other people) 
and had them rated by a number of people. 
The preference profiles showed great indi- 
vidual differences. Similarly varied profiles 
were obtained in a study of the nutritional 
requirements of several persons. But the au- 
thor points out that “never in the entire his- 
tory of the world has even one human being 
been studied comprehensively.” Even a few 
thorough all-round case studies would throw 
into relief the variation and patterning of hu- 
man nature which must be recognized if the 
problems of mankind are to be solved. With- 
out such recognition mutual hatred and con- 
tempt are inevitable between man and man 
and between group and group, misguided at- 
tempts at regimentation will increase, and 
freedom will be progressively lost. Think 
how much disharmony even within a single 
family would disappear if each person recog- 
nized every other’s need and right to be dif- 
ferent! 

From the point of view of the biologist, the 
social sciences seem obsessed with the false 
idea of human uniformity, and this holds even 
for social psychology. Psychologists in gen- 
eral are too much devoted to averages and to 
“the individual,’ instead of to individuals. 
Yet much can be hoped from psychology, if 
only a larger number of psychologists would 
maintain close contact with the sister science 
of genetics. 

From the tone of this review one might 
expect to find the book heavy reading, which 
is far from being the case. Those who have 
read it have enjoyed it and found it illumi- 
nating and stimulating. 


Robert S. Woodworth 


Columbia University 
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Breckinridge, Elizabeth Llewellyn. Effective 
use of older workers. Chicago: Wilcox and 
Follett, 1953. Pp. xiv + 224. $4.00. 


Ever since the last depression, the economy 
has been concerned with how to handle the 
older worker. In depression the solution is: 
“get them out” by compulsory retirement; in 
periods of manpower shortage the solution is: 
detour around the established “compulsory 
retirement” plans. Contemporary psycholo- 
gists, sociologists, and economists, while rec- 
ognizing that the problems resulting from the 
increased number of older workers in the la- 
bor supply cannot be solved on an either-or 
basis, favor flexible retirement. 

Despite Edward L. Ryerson’s foreword 
showing a preference for “a fixed retirement 
age,’ Mrs. Breckinridge’s book is impressive 
primarily because it shows that many busi- 
ness executives are appreciative of the values 
inherent in the older workers’ skills, knowl- 
edges, and excellent work habits. The many 


cooperating companies, indeed, have devel- 
oped plans for “flexible retirement,” and more 
humane hiring policies not only to increase 
the manpower potential for themselves but 
also to augment community values by enhanc- 


ing the self-respect of older workers and their 
families. 

The book is a golden strand beset with 
four genuine pearls: the quoted addresses of 
Elizabeth Hatch, Carson Pirie Scott and 
Company; Curtis Gallenbeck, Inland Steel 
Company; L. S. Barrus, Cleveland Twist 
Drill Company; John Bromer, Prudential In- 
surance Company. In a sense, the four 
closely packed speeches carry the basic argu- 
ment. Miss Hatch stresses the need not only 
that management abandon the prejudices 
against race and creed, but also that the em- 
ployer (and his surrogates, the managers and 
foremen) must get rid of the bias against age. 
She suggests that industry’s own records on 
production, absenteeism, turnover, and health 
can provide the research evidence for over- 
coming the misjudgments about older workers. 

Gallenbeck, believing in individual differ- 
ences in the aging process, considers the utili- 
zation of older workers either by matching 
their abilities and skills to the jobs they can 


do, or more significantly by adjusting the 
jobs and the environment to their character- 
istics. Transmotion rather than “downgrad- 
ing” is the key to appreciating the dignity 
and value of older workers, and, thus, all 
workers. 

Barrus describes his company’s procedures 
for making a flexible retirement policy work. 
The approach is through counseling individu- 
als to prevent physical, economic, and psy- 
chological difficulties rather than to cure ma- 
jor disturbances later. The program starts 
by granting pensions at the age of 65 whether 
the worker retires or continues to work. The 
counseling process gives the worker the basis 
for making good decisions about what would 
be best for him. The program has been ef- 
fective both for the older worker and also for 
the plant. 

John Bromer chronicles the development of 
his company’s “group” counseling program, 
which, to a degree, was requested by the 
workers themselves. The group sessions were 
therapeutic not only through the sharing of 
personal experiences and the perceiving of 
the mutuality of problems but also through 
the reduction of “retirement shock.” The 
group enterprises, moreover, give the work- 
ers opportunities to design and redesign a 
comprehensive retirement program. 

Many psychologists, indeed, would find it 
profitable to read Mrs. Breckinridge’s book 
against the richness of background and of 
suggestion for the “effective use of older 
workers” made by Solomon Barkin‘ twenty 
years ago. A major value of her book is 
the evidence it gives of the progress that has 
been made toward attaining the objectives 
of Barkin’s constructive action programs to 
eliminate the prejudices against age by utiliz- 
ing the skills and abilities of older workers. 
Perhaps it does take at least a generation to 
disseminate ideas significant for the social 
good. Mrs. Breckinridge gives evidence about 
the richness of the potential harvest. 


Irving Lorge 


Teachers College, Columbia University 


1 Barkin, S. The older worker in industry, a study 
of New York State manufacturing industries. Re- 
port of the Joint Legislative Committee on Unem- 
ployment. Albany: J. B. Lyon Co., 1933. Pp. 467. 
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Bellows, Robert M., and Estep, M. Frances. 
Employment psychology: the interview. 
New York: Rinehart, 1954. Pp. 295. 
$4.25. 


The text was “written for professional in- 
terviewers and for students of applied psy- 
chology who desire to become acquainted 
with the uses and limitations of the inter- 
view in selection of personnel.” 

At the outset, the authors list the following 
four requirements of self-training aimed at 
helping the interviewer to become more pro- 
ficient: 


1. “Interviewers must be aware of the human fal- 
lacies and errors that seem ever-present in the ap- 
praisal of men; 

2. “they must realize the necessity of maintaining 
an objective point of view; 

3. “they must be familiar with the results of sci- 
entific investigations in this area; 

4. “they must be able to develop, evaluate and 
apply objective methods, such as item analysis tech- 
niques, for specific interview situations.” 


The text was planned to provide fundamental 
information in nontechnical language in the 
above four areas. 


Actually, the text has much wider scope 


than is indicated by its title. Relatively little 
emphasis is devoted to the techniques of the 
interview per se. Practical statistical analy- 
sis and evaluation are discussed in more 
detail. The research approach to the inter- 
view, as opposed to the strictly clinical inter- 
view, is discussed and brought into sharp 
focus. 

As a whole the text is quite good and prob- 
ably suits the practicing interviewer’s pur- 
poses much better than the student’s. A 
great deal of emphasis is placed on the 
weighted application blank. An example of 
the horizontal percentage method of weight- 
ing the variables is presented. Examples of 
other weighting methods would have been de- 
sirable since the method presented gives un- 
wieldy large weights ranging from 0 to 100 
for each item. 

Chapter 6, “The Interviewing Process,” is 
concerned with the management of the inter- 
view; such items as training interviewers, 
maintaining rapport, recording information, 


and using the pause are intelligently dis- 
cussed. 

It is believed that Chapter 8, “Errors in 
Making Judgments,” will be one of the most 
stimulating - sections to interviewers. It is 
very common to hear, “I can easily size up 
an applicant.” This chapter will very ade- 
quately serve to present the many pitfalls in 
“sizing up people” and should lead to clearer 
thinking. Experienced employment _inter- 
viewers will be challenged and many will not 
perceive this chapter at all. 

All of the chapters reflect a keen practical 
knowledge of the personnel field. The text 
will be appreciated more by readers who 
have a pretty extensive statistical background 
since mention is made of such items as chi 
square, the Wherry-Doolittle method, item 
analysis, and the successive hurdles method. 

The emphasis throughout the text is that 
the selection interview using the so-called 
talking methods alone lacks validity and 
utility. More objective selection material 
must be used in conjunction with the inter- 
view if selection is the goal. Those who use 
the interview per se as their sole selection 
device will probably have their feathers 
ruffled. 

One aspect which has some merit but seems 
to be neglected by Bellows and Estep is that 
pertaining to the rational approach available 
to the interviewer. That is, if the inter- 
viewer knows on an a priori basis what things 
he is looking for in the applicant, there is rea- 
son to believe that these aspects can be 
measured in the interview although perhaps 
not as accurately as through tests. 

The reviewer would have liked to see dis- 
cussed such related topics as occupational in- 
formation and attitudes, age differences in 
job values, and pertinent information from 
the counseling field pertaining to the degree 
of leads various interviewers use. The pres- 
entation would have been more stimulating 
and complete, but it is well worth reading 
regardless. 


Bernard Hanes 


Personnel Research Consultants 
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Davis, A., and Eells, K. Davis-Eells test 
of general intelligence or problem-solving 
ability. (Davis-Eells Games.) Yonkers: 
World Book Co., 1953. Manual, Pp. 72, 
$0.80; primary test (grades 1-2), Pp. 16, 
and directions, Pp. 16, specimen set $0.35; 
elementary test (grades 3-6), Pp. 20, and 
directions, Pp. 19, specimen set $0.35. 


These scales represent the culmination of 
an enormous amount of effort on the part of 
the “Chicago” group to develop tests which 
are interesting to children, free from cultural 
status bias, and not dependent upon schooling 
and specific training. In order to eliminate 
reading ability as a source of variance, pic- 
torial items (drawings) are used, with oral 
presentation of the problem posed by each 
item. The test administrator is urged to 
maintain a “games” rather than a “test” at- 
mosphere. 

After several tryouts, four types of items 
were deemed satisfactory: (a) “probabilities” 
—which of three responses is true as describ- 
ing what is happening in a picture; (b) “best 
ways’ —which of three pictured ways is best 
for doing something; (c) “analogies’—simi- 
lar to usual verbal except that objects are 
pictured; and (d) “money” problems—which 
of three sketched starting combinations of 
coins is the best in order to have a given 
number of cents when other coins are added. 
The money items do not appear in the pri- 
mary test, nor does the primary test for 
grade 1 include probabilities. 

It is admitted that the intellectual activi- 
ties required for the tasks “are not unitary, 
but complex.” On the basis of interviews 
with testees, eight types of mental processes 
are said to be involved in the solutions of the 
posed problems, but none of these eight will 
be found among the rubrics of the factor 
analysts. The reviewer’s casual examination 
of the items suggests the possibility that dif- 
ferences in auditory perceptual span will af- 
fect performance on probabilities since the 
child must choose from three alternative 
statements presented orally, that differences 
in perceiving detail will be frequently in- 
volved, and that numerical ability is called 
for in the money items. It is said that “Such 
a range of mental activities is also required 


in solving most problems in life and in the 
school curriculum.” This apparent excuse 
for a hodge-podge test is not exactly recon- 
cilable with the authors’ repeated emphasis 
that school achievement is an unsatisfactory 
criterion for ascertaining test validity. Actu- 
ally, the final scales correlate in the low .40’s 
with achievement measures. Despite the fact 
that such low correlations are called “sub- 
stantial,” it is evident that school achieve- 
ment does not depend very much on the ac- 
tivities being measured by the Davis-Eells 
Games. 

The authors propose to replace the vague 
term “intelligence” by the more descriptive 
term “problem-solving ability.” To this the 
reviewer would have no objection provided 
evidence were marshalled to show something 
of the generality of problem-solving ability. 
What of the intercorrelations among the four 
subtests? And how highly do these subtests 
correlate with other tasks having equally 
plausible “face” validity? Replacing one 
vagueness with another vagueness can only 
lead to vagueness. 

Commendation is due for the painstaking 
and extensive work entailed in the develop- 
ment of these tests. The final selection (or 
elimination) of items was based primarily on 
Gulliksen’s “reliability index” (maximation 
of variance) and secondarily on differentia- 
tion between school grade levels. The norms 
are based on 19,000 cases, a sample which is 
shown to be typical of urban children. Cor- 
relations with various forms of Otis scales 
range from .39 to .66 with about .55 as a 
typical value. 

Although the authors “believe it is un- 
fortunate to apply the term ‘IQ,’ with its 
present connotations and its various vague 
meanings, to the score on any test,” they go 
on to say that users of the test, if they prefer, 
“may quite appropriately apply this term 
[IQ] to what the authors prefer to call the 
Index of Problem-Solving Ability.” Thus the 
IQ receives still another connotation! 

The most disturbing aspect of the Davis- 
Eells Games is the fact that with 60 minutes 
of testing time for grade 1 the reliability 
(Brown-Spearman) is only .68, for 90 min- 
utes of testing in grade 2 the reliability is 
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.82, and for 110 minutes of time for grades 
3 to 6 the reliabilities are .84, .83, .82, and 
.81 respectively. One wonders why a test 
with such low reliability should be pub- 
lished commercially. One might also hope for 
some evidence that the new test is predictive 
of something. 


Quinn McNemar 
Stanford University 


Committee on Colorimetry of the Optical So- 
ciety of America. The science of color. 
New York: Thomas Y. Crowell, 1953. Pp. 
385. $7.00. 

This is called “the definitive book on color 
for scientists, artists, manufacturers and stu- 
dents.” It has been in preparation by the 
Committee on Colorimetry, under the chair- 
manship of L. A. Jones, since 1932. The dis- 
cussion is introduced by a historical account 
of the use of color by prehistoric man and in 
ancient civilizations. This is followed by ac- 
counts of the philosophy of color, the anat- 
omy and physiology of color vision, and the 
psychology of color. The last three chapters 
are concerned with the psychophysics of color 
and colorimetry. 

Essentially, this book is a basic treatise on 
color and color vision dealing with the physi- 
cal, physiological, and psychological aspects 
of the problem. As might be expected in 
terms of the Committee’s work, physical con- 
cepts, psychophysics of color and colorimetry, 


are stressed although the psychological as- 
pects are not neglected. 

Acquaintance with the contents of this 
treatise is essential for all those doing re- 
search in the field of color and color percep- 
tion. In the past, psychologists as well as 
others have been disturbed by the confusion 
and controversy in the field of color due to 
lack of clear definitions and precise distinc- 
tion between terms, particularly in areas 
where physics, psychophysics, and psychol- 
ogy overlap. A highly important and hope- 
fully a permanent contribution of this trea- 
tise is the proposal and consistent use of a set 
of terms which make clear distinctions be- 
tween similar concepts in these sciences. 

Applied psychologists will probably find 
most helpful the section dealing with the sen- 
sory aspects of color vision including classifi- 
cation of color deficiencies and evaluation of 
the more common tests of color vision. Nev- 


ertheless, acquaintance with much of the re- 
maining materials will contribute to sounder 
decisions in the selection and use of color by 
the applied psychologist. 

The authors have produced a relatively 


readable book on a highly technical subject. 
Most people interested in color, either casu- 
ally or professionally, will enjoy and profit 
by reading parts or all of this treatise. 


Miles A. Tinker 


University of Minnesota 





New Books, Monographs, and Pamphlets 


Books, monographs, and pamphlets for listing and possible review should be sent to Dr. John G. Darley, 
408 Johnston Hall, University of Minnesota, Minneapolis 14, Minnesota. 


The anatomy of personality. Donald K. 
Adams. New York: Doubleday & Com- 
pany, 1954. Pp. 44. $.85. 

Elements of Rorschach interpretation. Rob- 
ert M. Allen. New York: International 
Universities Press, 1954. Pp. 242. $4.00. 

Adjustment to blindness. Mary K. Bau- 
mann. Philadelphia: Department of Wel- 
fare, Commonwealth of Pennsylvania, 1954. 
Pp. 198. 

On the nature of psychotherapy. Arnold 
Bernstein. New York: Doubleday & Com- 
pany, 1954. Pp. 36. $.85. 

The study of personality. Howard Brand. 
New York: John Wiley & Sons, 1954. Pp. 
581. $6.00. 

Group therapy for mothers of disturbed chil- 
dren. Helen E. Durkin. Springfield, IIl.: 
Charles C Thomas, 1954. Pp. 125. $3.50. 

In-laws: pro & con. Evelyn Millis Duvall. 
New York: Association Press, 1954. Pp. 
400. $3.95. 

A university looks at its program. Ruth E. 
Eckert and Robert J. Keller. (Eds.) 
Minneapolis: Univer. of Minnesota Press, 
1954. Pp. 223. $4.00. 

How to keep romance in your marriage. W. 
Clark Ellzey. New York: Association 
Press, 1954. Pp. 182. $2.95. 

The attack on big business. J. D. Glover. 
Cambridge: Harvard Univer. Press, 1954. 
Pp. 375. $4.00. 

Art and play therapy. Emery I. Gondor. 
New York: Doubleday & Company, 1954. 
Pp. 61. $.95. 

A primer of Freudian psychology. Calvin S. 
Hall. Cleveland: The World Publishing 
Company, 1954. Pp. 137. $2.50. 

The development of modern sociology. 
Roscoe C. Hinkle, Jr., and Gisela J. Hinkle. 
New York: Doubleday & Company, 1954. 
Pp. 75. $.95. 

Nebraska symposium on motivation. 
shall R. Jones. (Ed.) Lincoln: Univer. 
of Nebraska Press, 1954. Pp. 322. $3.00. 

The development of personality. C. G. Jung. 
New York: Bollingen Series, 1954. Pp. 
235. $3.75. 

What is electroshock therapy? Edward F. 
Kerman. New York: Exposition Press, 
1954. Pp. 152. $3.50. 


Mar- 


Handbook of social psychology. Vol. I. 
Theory and method. Gardner Lindzey. 
(Ed.) Cambridge: Addison-Wesley Pub- 
lishing Company, 1954. Pp. 588. $8.50. 

Handbook of social psychology. Vol. I. 
Special fields and applications. Gardner 
Lindzey. (Ed.) Cambridge: Addison- 
Wesley Publishing Company, 1954. Pp. 
1226. $8.50. . 

The social background of political decision- 
makers. Donald R. Mathews. New York: 
Doubleday & Company, 1954. Pp. 71. 
$.95. 

The concept of schizophrenia. W. F. Mc- 
Auley. New York: Philosophical Library, 
1954. Pp. 145. $3.75. 

Educating women for a changing world. Kate 
Hevner Mueller. Minneapolis: Univer. of 
Minnesota Press, 1954. Pp. 302. $4.75. 

Religion and society. Elizabeth K. Notting- 
ham. New York: Doubleday & Company, 
1954. Pp. 84. $.95. 

Mathematics and plausible reasoning. Vol. I. 
Induction and analogy in mathematics. G. 
Polya. Princeton: Princeton Univer. Press, 
1954. Pp. 280. $5.50. 

Mathematics and plausible reasoning. Vol. 
II. Patterns of plausible inference. G. 
Polya. Princeton: Princeton Univer. Press, 
1954. Pp. 190. $4.50. 

Thinking and speaking. G. Revesz. (Ed.) 
Amsterdam: North Holland Publishing 
Company, 1954. Pp. 205. $4.00. 

A survey of clinical practice in psychology. 
Eli A. Rubenstein and Maurice Lorr. 
(Eds.) New York: International Uni- 
versities Press, 1954. Pp. 363. $6.00. 

Modern experiments in telepathy. S. G. Soal 
and F. Bateman. New Haven: Yale Uni- 
ver. Press, 1954. Pp. 425. $5.00. 

Decision processes. R. M. Thrall, C. H. 
Coombs, and R. L. Davis. (Eds.) New 
York: John Wiley & Sons, 1954. Pp. 332. 
$5.00. 

Interpreting social change in America. Nor- 
man F. Washburne. New York: Double- 
day & Company, 1954. Pp. 50. $.95. 

A study of participation in college activities. 
E. G. Williamson, W. L. Layton, and M. L. 
Snoke. Minneapolis: Univer. of Minnesota 
Press, 1954. Pp. 99. $2.25. 





BULLETINS ON PERSONNEL EVALUATION 


COMPLIMENTARY TO INDUSTRIAL, PERSONNEL, APPLIED PSYCHOLOGISTS 


(1) STEPS IN HIRING. Recommends thirteen steps for scientific processing of an 
applicant through the employment procedures. 7 pages. 


(2) DESCRIPTION OF BIOGRAPHICAL FIELDS. Descriptions and directive 
questions for the interviewer in thirteen biographical-psychological fields. 10 pages. 


(3) PERSONALITY EVALUATION OF EMPLOYEES. Describes the basic 
pase factors, and their relation to job success-failure in business and industry. 
pages. 

(4) STEPS IN MERIT RATING PROGRAM. Outlines nine steps for a company 
merit rating program, including supervisory training, administrative decisions, employee 
progress review. 20 pages. 

(5) STEPS IN PERSONNEL TEST PROGRAM. Ten recommended steps for 
a company program in aptitude testing, including a series of nine case studies. 12 pages. 


(6) SOURCES IN PERSONNEL MANAGEMENT. Lists names and addresses 
of 147 journals, 53 associations, 67 book and test publishers, and 106 industrial relations 
centers. 15 pages. 


The above bulletins will be sent without charge to industrial, personnel 
or applied psychologists. Please make your request on organization 
letterhead, stating names of bulletins desired, and that your est is 
made as a Journal of Applied Psychology reader. Send to Industrial 
Psychology, Inc., Box 6, Arizona. 


chology, Inc. ee the Application-Interview Series, Aptitude Job-Tests Program, 
ies, Merit ey | Series, Vocational Guidance Packet, University Packet. 
i Chicago, Washington, D.C., Buffalo, Grand Rapids, iwaukee, 


Professional 
Kansas City, Mobile, Denver, Francisco, Los Angeles, Montreal, Toronto, Honolulu. 

















Books 


PERSONNEL AND INDUSTRIAL PSYCHOLOGY 
New Second Edition 


By Epwin E. GHISELLI and CLARENCE W. BROWN, University of California, 
Berkeley. McGraw-Hill Series in Psychology. 506 pages, $6.00 


i — of this book is to provide a comprehensive treatment of personnel and indus- 
ind and information relative 2 es and procedures that cove a bearing on the 
utilization of manpower. d edition is thoroughly revised and — 

new chapters on selection and classification of workers and on social factors in ind a. Sa 


ciples rather than are em thro and the importance of em 
3 practices phasized ughout, po p 


PSYCHOLOGY FOR LIFE 


By HARRY RUJA, San Diego State College. In press 


An outstanding survey of the field of psychology a te for the first course in colleges and 
universities. It is student-centered in content an e and prominently treats the 

of great interest in such courses. This includes promo efficient college 1 

vocation intelligently, cultivating skill in r buil social skills, con 

anger, and a healthy mind. Order of presentation is from concrete to abstract, 
practical to th ant f: to strange. 


TEXTBOOK OF SALESMANSHIP. New Fifth Edition 


By Freperic A. RUSSELL and FRANK H. BEACH, University of Illinois. 552 
pages, $5.75 


An excellent revision of what is by far the most successful book in the field. The general se 
is —— the same as before: to offer 2 broad, apm mat po anit tag the in re 
or basic, salesmanship course Psnirete e er types selling. e treatment of the 
psychological aspects of selling is grea ened. Written in a readable and interesting 
style it is well organized, logical in ase ere ent, and eminently teachable. 


MOTIVATION RESEARCH IN ADVERTISING 
AND MARKETING 


By Georce Hors.ey SMITH, Rutgers University. 254 pages, $5.00 


This volume, the second in the series sponsored cation Advertising Research Foundation, is an 
introduction and guide to the best possible a of the social sciences in the man complex 
problems of adv and mark fact finding. It will aid advertisers, adve: media 
and agencies, and sales and mark personnel in a research techniques properly. 
It shows how motivation studies should be —, an particular, covers the application 
of projective psychological techniques to advertising and selling. 
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